Presentation is loading. Please wait.

Presentation is loading. Please wait.

RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information.

Similar presentations

Presentation on theme: "RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information."— Presentation transcript:

1 RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information visit

2 R ELIABILITY A VAILABILITY M AINTAINABILITY P REDICTABILITY S CALABILITY While budgets may be tighter the requirement for maximum uptime has not gone away The design of your facility is only one piece of the pie that will effect your site’s uptime It is important that we are aware of how Reliability, Availability, Maintainability, Predictability and Scalability all affect your site’s uptime

3 Reliability is the ability of a system to perform and maintain its functions in routine circumstances, as well as hostile or unexpected circumstances RELIABILITY

4 Reliability What is reliability? Weibull Markov Reward modeling Modeling IEEE Gold Book Procedures: accurate, confirmed/tested Equipment selection Generator UPS Systems EPO Systems Switchgear Monitoring systems For additional information visit

5 Reliability Reliability modeling Equipment Commissioning Operations & maintenance For additional information visit

6 Reliability Bathtub curve of reliability – Infant mortality Burn in/load testing Commissioning – Useful life Proper maintenance – End of life Identify and replace prior to entering this period For additional information visit

7 Reliability The reliability of a system is no greater than the weakest component in a system series In a complex system you need to identify and quantify the importance of each component in the system A reliability block diagram is a graphical representation of the components of the system and how they are related to reliability For additional information visit

8 Reliability Many of the reliability design ideas share a common philosophy with those recommended for availability This is because there is a very close relationship between reliability and availability While reliability is about how long an application runs between failures, availability is the ability of a system to tolerate failures and how long it is accessible to the users Obviously, when a system's components and services are highly reliable, they cause fewer failures from which to recover and thereby help increase availability For additional information visit

9 Reliability Major manufacturers – Past experiences – Local maintenance support – Parts distribution centers Fine line between leading edge and bleeding edge Formal submittal review meetings Equipment For additional information visit

10 Reliability Generator’s isolation valves ATS bypass TVSS indicators and alarms Lightening protection EPO systems – Wiring – Control relays – Covers – Diagrams – Testing – Day 2 changes Equipment For additional information visit

11 Reliability Generators – Redundant batteries – Battery monitoring – Fuel level monitoring – Water heater jacket isolation valves – Silicon heater hoses – Coolant level pre-alarms, both cores – Water separators (Racor Filters) with alarms – Engine diagnostic link Equipment For additional information visit

12 Reliability UPS systems – Dual input – Maintenance bypass cabinet – Advanced monitoring – Battery monitoring – Redundant battery strings for VRLAs – Site specific procedures Equipment For additional information visit

13 Reliability Automatic Transfer Switches (ATS) – Maintenance bypass or wrap around breakers – Phase sync monitoring – Pause Neutral/dual solenoids – Monitoring Transient Voltage Surge Suppression (TVSS) – Monitoring – Indication of operation – Surge counter Equipment For additional information visit

14 Reliability EPO systems – Wiring in conduit and not open plenum – Control relay coils should not be energized until activation – Secondary covers installed over the EPO buttons – Detailed and accurate schematics diagrams – System should be designed so it can be tested – System should be capable of making day 2 changes without risk – Part of an engineered drawing and not a cloud saying “by others” Equipment For additional information visit

15 Thermal runway – Increase heat density Reduce time to thermal runway Increase the need for a reliable HVAC system Specialized HVAC systems Possibly switching from emergency to UPS power Long UPS battery runtimes may be unclear Reliability Rack layout, equipment airflow direction – Cold/hot aisle – Enclosed hot aisles Type rack – Doors – Vents – Fans Equipment For additional information visit

16 Reliability Water storage – Chilled water In the event of power outage or temporary chiller failure, do you have the capability to ride through – Makeup water How reliable is the city water supply Do you have diverse sources Water storage tanks Well Other water sources Equipment For additional information visit

17 Commissioning – With each project being unique, there is a need to determine how much commissioning is appropriate for the project. Factors that influence this decision include: Reliability Building’s mission-criticality Facility’s use or purpose Complexity of the building’s systems Building type and size Project type, whether existing building system or retrofit, or both Building tenant or occupant demographics System reliability requirements Owner’s objective in commissioning the building; IAQ, system reliability and/or energy efficiency Project budget Commissioning For additional information visit

18 Reliability Use a pilot/copilot approach Commercial airplanes do not fly with just one pilot - why would you Standardize as much as possible – Standard procedures – Standard process Use a Computer Maintenance Management System (CMMS) – Timely reports and schedules – Accurate information – Archive past performance – Instant access to information Operation and Maintenance For additional information visit

19 Availability is the ability of a system to tolerate failures Refers to the time that a system is available to its users This means the process continues to be served through the failure and that, ideally, the failure is transparent to the user AVAILABILITY For additional information visit

20 Availability Design Resources Procedures For additional information visit

21 Availability Availability is typically expressed by the number of nines Downtime per year Availability# of ninesDowntime 90% 1-nine 36.5 days/year 99% 2 nines3.65 days/year 99.9% 3 nines8.76 hours/year 99.99%4 nines52 minutes/year 99.999%5 nines5 minutes/year 99.9999% 6 Nines31 seconds/year For additional information visit

22 Availability Failures can be attributed to the following causes: Design failures – This class of failures takes place due to inherent design flaws in the system. In a well designed system, this class of failures should make a very small contribution to the total number of failures Infant mortality – This class of failures cause newly manufactured hardware to fail. This type of failure can be attributed to manufacturing problems like poor soldering, leaking capacitor etc. – These failures should not be present in systems leaving the factory as these faults will show up in proper factory system burn-in tests For additional information visit

23 Availability Random failures – Random failures can occur during the entire life-cycle of a system. These failures can lead to system failures. Redundancy is provided to recover from this class of failure – Wear out – Once a hardware module has reached the end of its useful life, degradation of component characteristics will cause hardware modules to fail. These types of faults can be weeded-out by preventive maintenance and routing of hardware For additional information visit

24 Availability Designing systems with sufficient levels of redundancy Eliminating single points of failure Availability design guidelines – Consult your engineer – TIA Standard - TIA 942 – Uptime Institute – Tier Definition Design For additional information visit

25 Availability Design System design should have multiple paths – Active or passive, depending upon the site reliability requirements – If redundant paths need to be VE? out to meet the project budget, consider adding the breaker or valve now or later; when budget allows add the actual feed – By adding the breaker or valve up front you will be able to install temporary cable or piping when an emergency arises For additional information visit

26 Availability Design When performing maintenance, and decreasing the availability of system redundancy, move the reduction of availability away from the critical load and toward the utility as much as possible – i.e. If you had a system plus system design and you are going to take the UPS out of service for maintenance, do not just open the UPS system and allow downstream dual cord devices and static transfer switch handle the loss of redundancy (?) – Place the UPS in maintenance bypass to continually feed the second source with stable power – Better yet, place the UPS on generators or alternate UPS supply to avoid sending unprotected utility power to the critical load For additional information visit

27 Availability Resources Technical resources – Operation staff – Response staff – Maintenance & repair staff Parts – Onsite spares – Manufacturer spares – Vendor spares – Supply houses For additional information visit

28 Availability Operation staff – Whether you are using in-house or contracted staff, it is important to ensure they have the proper resources Proper access to the facility If using key card system what happens when the card readers lose power? Who has the keys? Do you have all of your operation staff’s phone numbers – Cell numbers and home numbers – Company and personal emails Resources Operation Staff For additional information visit

29 Availability Emergency response – Types of emergency responses Additional operation staff Electrical, mechanical & plumbing contractors General construction Testing and repair firms Fire and security Hazardous material spill – List of suppliers and vendors Emergency contact information Alternate contact information – Contracts in place to execute after hours support – Meet them before an emergency arises, have them at the site for lunch Resources Response Staff For additional information visit

30 Availability Do you have the necessary contracts in place? Is there maintenance your operation staff can perform in house? Do you have alternate contact numbers for your maintenance providers? Do they have proper access to the facility? Do you have a second string waiting on the sidelines in case of an emergency? Resources Maintenance & Repair Staff For additional information visit

31 Availability Parts and supplies – Define and assess critical parts – Stock critical parts onsite Have an annual budget for spare parts that increases a little each year – Verify that your vendors and contractors have spare parts handy – Identify supply houses and suppliers that have parts you need – Have after hours phone number(s) to get parts from supply houses – Have contracts in place and make sure they are active Resources Parts For additional information visit

32 Availability Procedures Operation Maintenance Emergency Troubleshooting For additional information visit

33 Availability Operation procedures – Have detailed procedures that are specific to your developed site – Procedures should be tested and verified – Procedures should be inventoried and updated regularly – Operating procedures should be placed at the point of use and not locked-up in the building manger’s office Procedures Operation For additional information visit

34 Availability Maintenance procedures – Have detailed procedures for maintenance – Ask your maintenance provider to furnish all of the required maintenance procedures prior to performing maintenance, so you can review and comment on them – Use detailed procedures during your maintenance activities – Review procedures after the maintenance has been completed Procedures Maintenance For additional information visit

35 Availability Emergency procedures – In case of an emergency, where are your procedures – Can you access them – Are they at multiple locations – During an emergency is not the time to try to figure out how to restore a system – Perform dry runs on the procedures at least once a year – Update and change, as required Procedures Emergency For additional information visit

36 Availability Manuals – Available – Correct Drawings – Available and complete – As-builts Develop troubleshooting flow diagrams Procedures Troubleshooting For additional information visit

37 Maintainability is defined as the probability of performing a successful repair action or preventative maintenance within a given time In other words, maintainability measures the ease and speed with which a system can be restored to operational status MAINTAINABILITY

38 Maintainability Design Equipment Staff Location Maintenance program Training Coordination Maintenance windows For additional information visit

39 Maintainability Goals of Maintainability – Maximize efficiency and accuracy of on-line replacement of system components – Facilitate and minimize troubleshooting time at each level of maintenance activity – Allow test, checkout, troubleshooting and repair procedures to be unit-specific and structured to aid in identification of faulty units, then sub units – Reduce downtime – Provide easy access to malfunctioning components – Allow for high degree of standardization – Minimize time and cost of maintenance training – Simplify new equipment design and shorten design time by using previously developed, standard building blocks Design For additional information visit

40 Maintainability Equipment Access Labeling Minimize troubleshooting time – Monitoring – Procedures – Standardization – Test and service points Design For additional information visit

41 Accessibility refers to the relative ease with which a system can be accessed – Sufficient clearance to use the tools needed to complete the tasks – Adequate space to permit convenient removal and replacement of components – Adequate visual exposure to the task area – Adequate safety and working clearances – Adequate space for required rigging equipment – Adequate hallway, corner and door clearances back to loading dock Design Equipment Accessibility Maintainability For additional information visit

42 Maintainability Equipment rooms should be designed so that rapid, safe and easy removal and replacement of malfunctioning components can be accomplished by one technician, when possible Design Ease Removal and Replacement For additional information visit

43 Maintainability Labeling should: – Identify a specific device – Identify the purpose or function of a specific device – Present critical information – Present safety Information – Should be legible – Should use contrasting colors Ensure that your labeling is controlled to ensure its accuracy and standardization Periodic inspections and examinations Design Labeling For additional information visit

44 Maintainability Comprehensive monitoring Procedures Standardization Test and service points Design Minimize Troubleshooting Time For additional information visit

45 Monitoring capabilities – Event notification – Event reconstruction – Event mitigation – Determine maintenance frequencies – Allow for accurate and efficient communication of events Design Monitoring Maintainability For additional information visit

46 Maintainability What type of monitoring system do I need? – No monitoring Not recommended for any mission critical facility – Remote Alarm Status Panel (RASP) No trending or time stamping Gives visual and auditable notification Usually for one device or system – Monitoring with dry contacts Limited number of points Limited time stamping Status is either on or off – Serial interfaces Comprehensive data Data points with values rather than on/off Flexible and expandable Design Monitoring For additional information visit

47 Emergency Operating Procedures (EOP) – Developed for failure modes – Readily available for use – locate at point-of-service – Should be developed and tested during the commissioning phase – Detailed – switch level – Update any changes discovered Method Operating Procedure (MOP) – Developed for all operations – Detailed – switch level – Have back-out procedures included – Use with pilot/copilot approach – Update any changes discovered – Should be developed and tested during the commissioning phase Maintainability Design Procedures For additional information visit

48 Trouble-shooting procedures – Trouble-shooting flow charts – Restoration procedures Maintenance procedures – Detailed procedures – Include measure points for future trending – Used and completed during maintenance Maintainability Design Procedures For additional information visit

49 Maintainability Common procedures error traps – In-field decisions – Vague instructions – Undefined or uncommon terms – Burdensome or complex instruction – Multiple actions – Inconsistent statements or actions – Misleading or missing critical information – Interfacing with external procedures – Lack of ownership – Lack of quality assurance review Design Procedures For additional information visit

50 Maintainability Standardization ensures consistency and comparability of knowledge and parts – Acronyms Reduce confusion – Manufacturers Reduced spare part counts Familiarization with operations and maintenance – Layouts Reduce confusion Increase ease-of-use – Labeling Reduce confusion Standardization Design For additional information visit

51 Maintainability Test points provide a means for conveniently and safely determining the operational status of equipment and isolating malfunctions Test points, strategically placed, make signals available to the technician for checking, adjusting or troubleshooting Service points provide means for lubricating, filling, draining, charging and similar functions Test and Service Points Design For additional information visit

52 Maintainability General principles for test and service points – Avoiding need for frequent testing and service – Standardization – Test and service point compatibility – Labeling dangerous test and service compatibility – Distinctively different connectors and fittings – Location of test, service and adjustment points Test and Service Points Design For additional information visit

53 Maintainability Ordering the right accessories with your equipment can make a big difference when it comes to the maintainability of your equipment When ordering equipment or reviewing design documents, solicit input from your operations and maintenance staff involved It’s much cheaper to order it right the first time, than to upgrade it later in the field Equipment For additional information visit

54 Maintainability Water separators for fuel Radiator water level Isolation valves on water jacket heaters Generator-mounted circuit breakers Battery cables Battery monitoring Fuel-level monitor Equipment Generators For additional information visit

55 Maintainability Annual infrared thermal scanning Protective relays Breaker testing PLC Code – Hard copy – Up-loadable copy Beware of small UPS systems Station batteries Internal cleaning Mimic bus Equipment Switchgear For additional information visit

56 Maintainability Maintenance bypass – Order it with a maintenance bypass or design the system to have a manually operated breaker bypass to wrap around the ATS to both sources Equipment Automatic Transfer Switches For additional information visit

57 Maintainability AC filter capacitors – 3-5 years DC filter capacitors – 3-5 years Transfer circuits – Capture the transfer between UPS and bypass Procedures – Detail PM procedures – Capture before and after readings Calibration/maintenance – Capture details – Don’t just do a “dust and clean” PM Equipment UPS Systems For additional information visit

58 Maintainability VLA (flooded) – Vented lead acid – Quarterly maintenance VRLA (sealed) – Valve-regulated lead acid – Semi-annual maintenance Float voltage Room temperature Proper maintenance Water as required Battery monitoring Batteries found – UPS systems – Generators – Switchgear – PLCs and breakers – Telecom equipment Equipment Batteries For additional information visit

59 Maintainability Shutdown alarms – Identify and understand them EPO circuits – If used, is it maintainable? Monitoring – Main – Sub-panels – Branch circuit breakers Snap-in vs. bolt-in breakers – Use bolt-in breakers only Transformers – K-rated Equipment PDU’s For additional information visit

60 Maintainability Permanently installed load banks Generator testing – Annual load test – Troubleshooting UPS system testing – Annual load test – Troubleshooting Paralleling gear – Set-up and calibration – Troubleshooting Equipment Load Banks For additional information visit

61 Maintainability Alternate water source needs to be capable of supplying water, so that the primary water source can be removed for maintenance Usage metering should be on each water source Types of alternate water source – City water – Wells – Storage tanks Equipment Water Source For additional information visit

62 Maintainability Alignment ─ Will reduce wear and tear on shafts, bearings and seals ─ Reduce vibration ─ Decrease current draw Bearings ─ Accessible grease fittings ─ Grease as required Infrared thermal scanning ─ Motor problems ─ Alignment issues Equipment Pumps For additional information visit

63 Maintainability Temperature and humidity set points – Should be set the same Humidifiers – Have replacements for bulbs and canisters Filters – Use a pre-filter in dirty locations – Make sure your dirty filter Differential Pressure (DP) switch is set correctly Alignment – Proper alignment will reduce wear on the shaft and bearings Bearings – Grease when required – Infrared thermal heat scan Refrigerant leaks can activate fire alarms Equipment CRAH/CRAC For additional information visit

64 Maintainability Dispatched service – Verify your vendors qualifications as a company – Request resumes of the people performing work at your site – Review their technical aptitude – Verify your vendors training programs Onsite operation and maintenance staff – Verify that they are managed correctly (in-house or contracted) – Verify your staff’s resumes and qualifications – Review their technical aptitude – Verify training programs Staff For additional information visit

65 Maintainability Location and access of valuable resources is important when situations arise – 3:00 am Sunday morning is not the time to try to locate fuses required to get your site up and running There are various resources you should consider before the need arises; – Equipment – Technicians – Parts – Procedures – Manuals – Drawings Location For additional information visit

66 Maintainability It is important that your operation and maintenance staff is adequate and regularly trained When an emergency occurs they should have the confidence and experience to complete the task at hand – Available training methods; Self paced Classroom Web based Manufacturer’s training On-the-job training Procedure development Training module development Test beds Simulators Training For additional information visit

67 Maintainability Work activities – it is important to closely coordinate maintenance activities, to maintain a reliable, efficient and safe working environment During outage windows we have the tendency to plan too many activities at once. Make sure you don’t have too many people working in the same space at once Coordination For additional information visit

68 Maintainability Pay particular attention to planning of your maintenance activities – CRAC units – refrigerant leaks will activate the fire systems; make sure you disable the fire system* prior to charging a system – Under floor cleaning – can activate the fire alarm system; make sure you deactivate the fire alarm system* before you start to clean under the floor – There are other maintenance activities and tests that could mistakenly set-off the fire alarm system * When you disable a fire alarm system, make sure you follow the required procedures by OSHS, NFPA, local authorities, your company and your insurance underwriter. This could include, but is not limited to; additional fire extinguishers, posting fire watch, notification, special procedures, and tagging Coordination For additional information visit

69 Maintainability Maintenance activities – If you are planning to transfer your UPS to a generator maintenance bypass to perform maintenance on the UPS, PM the generator first – If you are planning to perform an open transfer to the building electrical system, inspect your UPS batteries first – Be aware of maintenance activities of building-wide systems that can effect the data center’s Chillers Pumps Electrical service Coordination For additional information visit

70 Maintenance windows Downtime vs. reduced reliability Reduction in reliability Design system to have various maintenance capabilities Move away from critical loads and towards utility Maintenance Windows Maintainability “Make sure you plan your maintenance windows carefully between IT and Facilities.” For additional information visit

71 Maintainability IT maintenance windows are often loaded with IT tasks and therefore are not completely available for facilities tasks Need to clearly define the true window for facility maintenance – Maintenance window is midnight to 6 am – IT takes an hour to shut down and an hour to start-up – Real outage is limited to 1 am to 5 am Maintenance Windows For additional information visit

72 Predictability is the ability to detect the onset of a failed system before it happens Predictive analysis can be performed by: Reviewing PM data Conducting failure analysis Monitoring systems Trending Advance diagnostics PREDICTABILITY

73 Predictability Reviewing PM data – PM should not only be a time to complete preventative maintenance tasks, but also be used as a diagnostic tool – Use detailed PM guides and complete them so they can be reviewed later – Review your PM task list and add additional items that can be used to perform predictive analysis – Record before and after data. This is important to set baselines and conduct trending For additional information visit

74 Predictability Conducting failure analysis – Event occurs – Complete an incident report Incident report should only contain facts of what happened during the event – Stabilize the system – Repair the system Take accurate and specific notes Take before and after readings Document For additional information visit

75 Predictability Conduct root cause analysis – It is not necessary to prevent the first, or root cause from happening – It is merely necessary to break the chain of events at any point and thus final failure cannot occur Recommendations – Make recommendation to prevent future failures – Implement those changes in the failed system and other similar systems – When the fault leads to an initial design problem, redesign is necessary – Where the fault leads back to equipment failure, develop ways to improve the component wear, quality and life – Where the fault leads back to a failure of procedures, it is necessary to either address the procedural weakness or to install a method to protect against the damage caused by the procedural failure For additional information visit

76 Predictability Monitoring systems – Install a monitoring system – Monitor as much as you can, as long as you do something with the points you select – Know what you are monitoring and what effects the points – Develop your point list to assist you in predictive analysis – Comprehensive monitoring systems will provide you with the best information For additional information visit

77 Predictability Trending – Once your monitoring system is installed, select key points to trend – Use your trends to develop replacement and PM intervals – Items you can trend: Temperatures Pressure Flow rates Usage – Time – Consumption Load For additional information visit

78 Predictability Advance diagnostic techniques – Infrared thermal imaging – Oil analysis – Coolant analysis – Fuel analysis – Ultrasonic analysis – Power quality testing – Battery impedance testing – Vibration testing – Motor analysis – Eddy current analysis – Laser alignment – Balancing For additional information visit

79 Predictability Uses for an IR camera – Belt tension – Pump alignment – Bearings – Electrical connections – Turbo chargers – Roof leaks – Poor insulation – Room seals For additional information visit

80 Predictability Infrared thermography – Is the process of developing visual images that represent variations in the IR spectrum – Any object that is above absolute zero omits IR energy – IR spectrum is between 2.0 and 15 microns – IR spectrum falls outside the range of the human eye – IR cameras detect the temperature changes that can potentially mean the presence of conditions or stressors that act to decrease the life of the equipment design – The IR camera can have many uses in a data center Unless you are the Predator you will need to use an IR Camera For additional information visit

81 Predictability Fuse Connection Overloaded Breaker Loose Cable Defective Breaker For additional information visit

82 Predictability Pump Alignment Water Under Roof Tank Level Missing Insulation For additional information visit

83 Predictability Oil analysis – Oil analysis is used to define three basic machine conditions Condition of the oil can determine lubricate viscosity, acidity, etc. Lubrication system condition: Have physical boundaries been violated? i.e. fuel in oil Machine condition by looking for wear particulars For additional information visit

84 Predictability Oil analysis – Oil condition is most easily determined by measuring the viscosity, acid number and base number – Additional tests can determine the presence and/or effectiveness of oil additives such as anti-wear addictiveness, antioxidants, corrosion inhibitors, and anti- foam agents – Component wear can be determined by measuring the amount of wear metals such as iron, copper, chromium, aluminum, lead, tin and nickel, and can identify when a particular part is wearing – Contamination is determined by measuring water content, specific gravity, and the level of silicon. Change in specific gravity typically indicates presence of other oil or fuel contamination For additional information visit

85 Predictability MetalsEnginesGears Iron Cylinder heads, rings, gears, crankshafts Gears, bearings Chrome Rings, liners, exhaust valvesRoller bearings Aluminum Pistons, thrust bearings, turbo bearings, main bearings Pump, thrust washers Nickel Valve plating, steel alloy from crankshaft, camshafts Steel alloy from roller bearings Copper Lube coolers, main and rod bearings, bushings, turbo bearings Brushings, thrust plates Lead Main and rod bearings, bushings, lead solder Bushings, grease contamination Tin Piston flashing, bearing overlays, bronze alloy Bearing cage metal Silver Wrist pin bushings, silver solder from lube coolers Silver solder from lube coolers Titanium Gas turbine bearings. Hubs, turbine blades N/A For additional information visit

86 Predictability Coolant analysis – Regular coolant testing and routine maintenance can help you achieve maximum system efficiency and save you time and money in less downtime – A cooling system is subject to pitting, corrosion, cavitations, erosion and electrolysis – Although coolants are formulated to help prevent these problems from occurring, coolant analysis will identify if they are present and determine if the coolant you're using is providing adequate protection For additional information visit

87 Predictability Fuel analysis – Fuel analysis can point to solutions for filter plugging, loss of power or poor injector performance – Testing bulk fuel storage tanks can verify compliance with required supplier specifications For additional information visit

88 Predictability Ultrasonic inspection – Ultrasonic or ultrasound are sound waves above 20kHz to 100kHz that can not be heard by humans – Unlike IR, ultrasound travels a short distance from the source – Ultrasonic detectors can be used to detect component wear, fluid leaks, vacuum leaks and steam trap failures – Even though such a leak may not be audible to the human ear, ultrasound will still be detectable with the appropriate tool For additional information visit

89 Predictability Pressure and vacuum leaks can occur in various locations – Compressed air – Heat exchangers – Boilers – Condensers – Tanks – Pipes – Valves – Steam traps Ultrasonic inspections can detect these small leaks For additional information visit

90 Predictability Mechanical systems suffer from wear through constant operation, and ultrasonic inspection can detect wear in these systems Mechanical applications – Bearings – Lack of lubrication – Pumps – Motors – Gear/gearboxes – Fans – Compressors For additional information visit

91 Predictability Mechanical devices are not the only devices that omit ultrasonic sound. Electrical equipment will also generate ultrasonic waves if arching, tracking or corona are present Electrical applications – Arching, tracking and corona – Switchgear – Transformer – Insulators – Circuit breakers For additional information visit

92 Predictability Power quality testing – Hardware and software are frequently blamed for all types of problems that may actually originate from within your building’s electrical distribution system; poor power quality – In many cases, the number one indication that you have a power quality problem is intermittent, unexplained technology equipment or process failures – Responding service technicians may complete a work report with the words “no trouble found" For additional information visit

93 Predictability Impedance testing – A substitute to performing a full load test – The internal resistance of a cell can be determined by how that cell responds to a momentary load – The instantaneous voltage drop and load current applied are used to calculate the resistance – Most cell testers can check the impedance with the battery online or offline For additional information visit

94 Predictability Vibration analysis – The level and frequency of the vibration of rotating machinery are not distinguishable to the human touch – Can be used to discover and diagnose a wide range of problems related to rotating equipment For additional information visit

95 Predictability Vibration monitoring can detect; – Unbalance – Eccentric rotors – Misalignment – Mechanical looseness or weakness Types of systems that vibration analysis should be performed on; – Generators – Cooling tower fans – Chillers – Pumps – CRAH/CRAC – Air handlers For additional information visit

96 Predictability Tests used to perform motor analysis – Infrared – Vibration analysis – Surge comparison – Motor current signature comparison Motor faults or conditions can be detected – Winding short circuits – Open coils – Improper torque settings – As well as other mechanical problems For additional information visit

97 Predictability Types of motor analysis – Surge comparison testing identifies insulation deterioration by applying a high frequency transient surge to equal parts of a winding, and by comparing the resulting voltage waveform – Motor Current Signature Analysis (MCSA) provides a non-intrusive method of detecting mechanical and electrical problems For additional information visit

98 Predictability Eddy current analysis – Detects surface and subsurface defects – Detects variations in alloy, heat treatments, hardness, structure and other physical metallurgical conditions – Should be done on chillers each year when the tubes are being cleaned For additional information visit

99 Predictability Alignment inspection – Shafts and pumps should have the proper alignment, and is best accomplished by using laser alignment – When machines are improperly aligned there are added loads to the bearings and couplings which can result in early and unplanned failures For additional information visit

100 Predictability Balance – Reduce wear and tear on bearings, shafts and motors – Can be detected with the use of infrared cameras and vibration meters – Requires balancing equipment to verify and correct balancing For additional information visit

101 SCALABILITY Scalability is a desirable property of a system which indicates its ability to either handle growing amounts of work in a graceful manner, or to be readily enlarged without impact to operations For example, it can refer to the capability of a system to increase total throughput under an increased load when resources (typically hardware) are added

102 Scalability What do we want… a flexible, scalable, reliable, highly performing, and highly available computer infrastructure that adapts to a wide range of continuously evolving and challenging demands For additional information visit

103 Requirements analysis Basis of Design (BOD) Design – Modular approach – Avoid excessive equipment – Pay as you go Expansion techniques What does it take? Scalability For additional information visit

104 Scalability Good planning and decisions are the foundation of a highly scalable facility At no point in the lifecycle of a mission-critical facility can you have greater impact on scalability then during the design phase Start with a Requirements Analysis (RA) of your data center needs Use the results of your RA to develop a Basis of Design (BOD) The RA and BOD are living documents and you need to update them as changes occur For additional information visit

105 Requirements analysis – Growth modeling takes the hardware platform requirements and turns them into space, power and cooling requirements – Considers both current and future technology impacts on space, power and cooling – Typically done for 3+ year planning – This leads to the critical infrastructure’s BOD Requirements Analysis Scalability For additional information visit

106 Roadmap to a reliable and quality-designed site More often then not, the BOD is lacking in detail Define the requirements of the site Defines the reliability, availability, maintainability, scalability and operational parameters Should be updated regularly Basis of Design Scalability For additional information visit

107 Scalability Designing with scalability in mind Scalability – Reduced initial cost – Reduced time to install equipment – Reduces the requirements of purchasing large systems – Not an advantage for fast-growing facilities Modular design can be more precisely matched to reflect; – Lower capital investment “Pay as you go approach” – Budget/capital constraints – Controlled growth – Unanticipated growth For additional information visit

108 Scalability Equipment rooms – When possible, design equipment rooms with space for expansion – Design hallways, corridors and doors to allow access for new equipment – Conserve wall space for future panels and equipment For additional information visit

109 Scalability Switchgear – Expansion breakers – Expansion cells – Be aware of bussing configuration, use fully-rated bus throughout – Use larger frame breakers with adjustable trips – Have expansion in your Programmable Logic Controller (PLC) Have access to programming codes Have current backup For additional information visit

110 Scalability UPS systems – Size parallel cabinet and static switch for full build-out – If modules are upgradeable, size feeders to full build-out – If equipped with sync control cabinet, size for full build- out Remember – When you start to add more then 3 modules in parallel, the redundancy begins to drop For additional information visit

111 Scalability Critical distribution – Dual main input Allow for the possibility of a second source to supply load during cutover or expansion activities Could be used to connect temporary equipment for emergencies Load bank testing – Spare breakers Allow for additional PDU and expected new load Up-frame the breaker so that larger loads may be added – i.e. use 400A frame breakers with 225A rating plugs to power PDUs For additional information visit

112 Scalability Power Distribution Units (PDUs) – Typically you run out of circuits before capacity – Install junction box below floor to allow for additional power whips. Bottom plates usually do not have enough knock-out – Order PDU’s with additional 225A sub-fed breakers to support additional Remote Power Panel (RPP) – Consider in-row PDU’s to save space For additional information visit

113 Scalability EPO systems – Plan on the fact that the EPO system will have items added and removed from it – EPO should be an engineered device and not a cloud stating ”by others” – System should be documented – Should have an Active, Test and Off mode of operation – Installed with isolation relays – Centrally located in an EPO control cabinet with room for expansion For additional information visit

114 Scalability Chilled water systems – When possible, up-size piping – Have additional valves installed under the floor so you can add CRAH units as needed – Have valves installed for additional pumps and chillers – Have a valve connection that can be easily hooked- up to a temporary chiller For additional information visit

115 Scalability Monitoring systems – Make sure that the system is expandable – Some systems are not up-gradable, while others require adding another module to the communication trunk – Make sure you will not be locked in with an uncooperative manufacturer – Have access to the programming function and required passwords For additional information visit

116 Scalability Expansion techniques – Implementation of new systems while the facility is in “production” is a business reality – The need for hot cutover occurs more often. For safety reasons, hot cutover should be a last resort – With proper upfront planning, the need for hot taps and cutovers can be reduced or eliminated For additional information visit

117 UPTIME Uptime ( Ŷ) is a measure of the time a system has been "up“, running and available. It came into use to describe the opposite of downtime, times when a system was not operational ρ = Reliability ά = Availability ц = Maintainability ∏ = Predictability ∑ = Scalability

118 R ELIABILITY A VAILABILITY M AINTAINABILITY P REDICTABILITY S CALABILITY Reliability (ρ) is the ability of a system to perform and maintain its functions in routine circumstances, as well as hostile or unexpected circumstances

119 R ELIABILITY A VAILABILITY M AINTAINABILITY P REDICTABILITY S CALABILITY Availability (ά) is the ability of a system to tolerate failures Refers to the time that a system is available to its users This means the process continues to be served through the failure and that, ideally, the failure is transparent to the user

120 R ELIABILITY A VAILABILITY M AINTAINABILITY P REDICTABILITY S CALABILITY Maintainability (ц) is defined as the probability of performing a successful repair action or preventative maintenance within a given time In other words, maintainability measures the ease and speed with which a system can be restored to operational status

121 R ELIABILITY A VAILABILITY M AINTAINABILITY P REDICTABILITY S CALABILITY Predictability (∏) is the ability to detect the onset of a failed system before it happens Predictive analysis can be performed by: – Reviewing PM data – Conducting failure analysis – Monitoring systems – Trending – Advance diagnostics

122 R ELIABILITY A VAILABILITY M AINTAINABILITY P REDICTABILITY S CALABILITY Scalability (∑) is a desirable property of a system which indicates its ability to either handle growing amounts of work in a graceful manner, or to be readily enlarged For example, it can refer to the capability of a system to increase total throughput under an increased load when resources (typically hardware) are added

123 UPTIME ρ * ά *ц * ∏ * ∑ = Ŷ

124 R ELIABILITY A VAILABILITY M AINTAINABILITY P REDICTABILITY S CALABILITY Be sure to look at more than just the design of your facility… don’t miss a step. Use RAMPS to achieve maximum uptime!

125 RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information visit

Download ppt "RAMPS © Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka Presented by Joe Soroka For additional information."

Similar presentations

Ads by Google