Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by Joe Soroka

Similar presentations

Presentation on theme: "Presented by Joe Soroka"— Presentation transcript:

1 Presented by Joe Soroka
RAMPS© Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka For additional information visit

While budgets may be tighter the requirement for maximum uptime has not gone away The design of your facility is only one piece of the pie that will effect your site’s uptime It is important that we are aware of how Reliability, Availability, Maintainability, Predictability and Scalability all affect your site’s uptime SCALABILITY PREDICTABILITY MAINTAINABILITY AVAILABILITY RELIABILITY

3 RELIABILITY Reliability is the ability of a system to perform and maintain its functions in routine circumstances, as well as hostile or unexpected circumstances

4 Reliability What is reliability? Modeling Equipment selection Weibull
Markov Reward modeling Modeling IEEE Gold Book Procedures: accurate, confirmed/tested Equipment selection Generator UPS Systems EPO Systems Switchgear Monitoring systems For additional information visit

5 Reliability Reliability Reliability modeling Equipment Commissioning
Operations & maintenance For additional information visit

6 Reliability Bathtub curve of reliability Infant mortality Useful life
Burn in/load testing Commissioning Useful life Proper maintenance End of life Identify and replace prior to entering this period For additional information visit

7 Reliability The reliability of a system is no greater than the weakest component in a system series In a complex system you need to identify and quantify the importance of each component in the system A reliability block diagram is a graphical representation of the components of the system and how they are related to reliability For additional information visit

8 Reliability Many of the reliability design ideas share a common philosophy with those recommended for availability This is because there is a very close relationship between reliability and availability While reliability is about how long an application runs between failures, availability is the ability of a system to tolerate failures and how long it is accessible to the users Obviously, when a system's components and services are highly reliable, they cause fewer failures from which to recover and thereby help increase availability For additional information visit

9 Reliability Equipment Major manufacturers
Past experiences Local maintenance support Parts distribution centers Fine line between leading edge and bleeding edge Formal submittal review meetings For additional information visit

10 Reliability Equipment Generator’s isolation valves ATS bypass
TVSS indicators and alarms Lightening protection EPO systems Wiring Control relays Covers Diagrams Testing Day 2 changes For additional information visit

11 Reliability Equipment Generators Redundant batteries
Battery monitoring Fuel level monitoring Water heater jacket isolation valves Silicon heater hoses Coolant level pre-alarms, both cores Water separators (Racor Filters) with alarms Engine diagnostic link For additional information visit

12 Reliability Equipment UPS systems Dual input
Maintenance bypass cabinet Advanced monitoring Battery monitoring Redundant battery strings for VRLAs Site specific procedures For additional information visit

13 Reliability Equipment Automatic Transfer Switches (ATS)
Maintenance bypass or wrap around breakers Phase sync monitoring Pause Neutral/dual solenoids Monitoring Transient Voltage Surge Suppression (TVSS) Indication of operation Surge counter For additional information visit

14 Reliability Equipment EPO systems
Wiring in conduit and not open plenum Control relay coils should not be energized until activation Secondary covers installed over the EPO buttons Detailed and accurate schematics diagrams System should be designed so it can be tested System should be capable of making day 2 changes without risk Part of an engineered drawing and not a cloud saying “by others” For additional information visit

15 Reliability Equipment Thermal runway
Increase heat density Reduce time to thermal runway Increase the need for a reliable HVAC system Specialized HVAC systems Possibly switching from emergency to UPS power Long UPS battery runtimes may be unclear Rack layout, equipment airflow direction Cold/hot aisle Enclosed hot aisles Type rack Doors Vents Fans For additional information visit

16 Reliability Equipment Water storage Chilled water Makeup water
In the event of power outage or temporary chiller failure, do you have the capability to ride through Makeup water How reliable is the city water supply Do you have diverse sources Water storage tanks Well Other water sources For additional information visit

17 Reliability Commissioning
Commissioning – With each project being unique, there is a need to determine how much commissioning is appropriate for the project. Factors that influence this decision include: Building’s mission-criticality Facility’s use or purpose Complexity of the building’s systems Building type and size Project type, whether existing building system or retrofit, or both Building tenant or occupant demographics System reliability requirements Owner’s objective in commissioning the building; IAQ, system reliability and/or energy efficiency Project budget For additional information visit

18 Operation and Maintenance
Reliability Operation and Maintenance Use a pilot/copilot approach Commercial airplanes do not fly with just one pilot - why would you Standardize as much as possible Standard procedures Standard process Use a Computer Maintenance Management System (CMMS) Timely reports and schedules Accurate information Archive past performance Instant access to information For additional information visit

19 AVAILABILITY Availability is the ability of a system to tolerate failures Refers to the time that a system is available to its users This means the process continues to be served through the failure and that, ideally, the failure is transparent to the user For additional information visit

20 Availability Availability Design Resources Procedures
For additional information visit

21 Availability Availability is typically expressed by the number of nines Downtime per year Availability # of nines Downtime 90% 1-nine days/year 99% 2 nines days/year 99.9% 3 nines hours/year 99.99% 4 nines 52 minutes/year 99.999% 5 nines 5 minutes/year % 6 Nines 31 seconds/year For additional information visit

22 Availability Failures can be attributed to the following causes:
Design failures This class of failures takes place due to inherent design flaws in the system. In a well designed system, this class of failures should make a very small contribution to the total number of failures Infant mortality This class of failures cause newly manufactured hardware to fail. This type of failure can be attributed to manufacturing problems like poor soldering, leaking capacitor etc. These failures should not be present in systems leaving the factory as these faults will show up in proper factory system burn-in tests For additional information visit

23 Availability Random failures Wear out
Random failures can occur during the entire life-cycle of a system. These failures can lead to system failures. Redundancy is provided to recover from this class of failure Wear out Once a hardware module has reached the end of its useful life, degradation of component characteristics will cause hardware modules to fail. These types of faults can be weeded-out by preventive maintenance and routing of hardware For additional information visit

24 Availability Design Designing systems with sufficient levels of redundancy Eliminating single points of failure Availability design guidelines Consult your engineer TIA Standard - TIA 942 Uptime Institute – Tier Definition For additional information visit

25 Availability Design System design should have multiple paths
Active or passive, depending upon the site reliability requirements If redundant paths need to be VE? out to meet the project budget, consider adding the breaker or valve now or later; when budget allows add the actual feed By adding the breaker or valve up front you will be able to install temporary cable or piping when an emergency arises For additional information visit

26 Availability Design When performing maintenance, and decreasing the availability of system redundancy, move the reduction of availability away from the critical load and toward the utility as much as possible i.e. If you had a system plus system design and you are going to take the UPS out of service for maintenance, do not just open the UPS system and allow downstream dual cord devices and static transfer switch handle the loss of redundancy (?) Place the UPS in maintenance bypass to continually feed the second source with stable power Better yet, place the UPS on generators or alternate UPS supply to avoid sending unprotected utility power to the critical load For additional information visit

27 Availability Resources Technical resources Parts Onsite spares
Operation staff Response staff Maintenance & repair staff Parts Onsite spares Manufacturer spares Vendor spares Supply houses For additional information visit

28 Availability Operation Staff Resources Operation staff
Whether you are using in-house or contracted staff, it is important to ensure they have the proper resources Proper access to the facility If using key card system what happens when the card readers lose power? Who has the keys? Do you have all of your operation staff’s phone numbers Cell numbers and home numbers Company and personal s For additional information visit

29 Availability Response Staff Resources Emergency response
Types of emergency responses Additional operation staff Electrical, mechanical & plumbing contractors General construction Testing and repair firms Fire and security Hazardous material spill List of suppliers and vendors Emergency contact information Alternate contact information Contracts in place to execute after hours support Meet them before an emergency arises, have them at the site for lunch For additional information visit

30 Availability Maintenance & Repair Staff Resources
Do you have the necessary contracts in place? Is there maintenance your operation staff can perform in house? Do you have alternate contact numbers for your maintenance providers? Do they have proper access to the facility? Do you have a second string waiting on the sidelines in case of an emergency? For additional information visit

31 Availability Parts Resources Parts and supplies
Define and assess critical parts Stock critical parts onsite Have an annual budget for spare parts that increases a little each year Verify that your vendors and contractors have spare parts handy Identify supply houses and suppliers that have parts you need Have after hours phone number(s) to get parts from supply houses Have contracts in place and make sure they are active For additional information visit

32 Availability Procedures Operation Maintenance Emergency
Troubleshooting For additional information visit

33 Availability Operation Procedures Operation procedures
Have detailed procedures that are specific to your developed site Procedures should be tested and verified Procedures should be inventoried and updated regularly Operating procedures should be placed at the point of use and not locked-up in the building manger’s office For additional information visit

34 Availability Maintenance Procedures Maintenance procedures
Have detailed procedures for maintenance Ask your maintenance provider to furnish all of the required maintenance procedures prior to performing maintenance, so you can review and comment on them Use detailed procedures during your maintenance activities Review procedures after the maintenance has been completed For additional information visit

35 Availability Emergency Procedures Emergency procedures
In case of an emergency, where are your procedures Can you access them Are they at multiple locations During an emergency is not the time to try to figure out how to restore a system Perform dry runs on the procedures at least once a year Update and change, as required For additional information visit

36 Availability Troubleshooting Procedures Manuals Drawings
Available Correct Drawings Available and complete As-builts Develop troubleshooting flow diagrams For additional information visit

37 MAINTAINABILITY Maintainability is defined as the probability of performing a successful repair action or preventative maintenance within a given time In other words, maintainability measures the ease and speed with which a system can be restored to operational status

38 Maintainability Design Equipment Staff Location Maintenance program
Training Coordination Maintenance windows For additional information visit

39 Maintainability Design Goals of Maintainability
Maximize efficiency and accuracy of on-line replacement of system components Facilitate and minimize troubleshooting time at each level of maintenance activity Allow test, checkout, troubleshooting and repair procedures to be unit-specific and structured to aid in identification of faulty units, then sub units Reduce downtime Provide easy access to malfunctioning components Allow for high degree of standardization Minimize time and cost of maintenance training Simplify new equipment design and shorten design time by using previously developed, standard building blocks For additional information visit

40 Maintainability Design Equipment Access Labeling
Minimize troubleshooting time Monitoring Procedures Standardization Test and service points For additional information visit

41 Equipment Accessibility
Maintainability Equipment Accessibility Design Accessibility refers to the relative ease with which a system can be accessed Sufficient clearance to use the tools needed to complete the tasks Adequate space to permit convenient removal and replacement of components Adequate visual exposure to the task area Adequate safety and working clearances Adequate space for required rigging equipment Adequate hallway, corner and door clearances back to loading dock For additional information visit

42 Maintainability Ease Removal and Replacement Design
Equipment rooms should be designed so that rapid, safe and easy removal and replacement of malfunctioning components can be accomplished by one technician, when possible With space at a premium in a data center the tendency is to design the equipment room to the minimum code requirements. This saves space in the design and meets the minimum code requirements but in many cases increases the time required to maintain and repair a system. These minimum clearance spaces will cost more in the long run. For example; 1. safety is hampered when dealing with minimum clearance. Backing into another panel and tripping a breaker or tripping on a housekeeping pad and landing on a rotating pump. 2. increased downtime, either a part is not changed in time because of its difficulty of replacing it, or during an outage the time to repair is increased do do space limitations. For additional information visit

43 Maintainability Labeling Design Labeling should:
Identify a specific device Identify the purpose or function of a specific device Present critical information Present safety Information Should be legible Should use contrasting colors Ensure that your labeling is controlled to ensure its accuracy and standardization Periodic inspections and examinations Accuracy of Identification required Time available for recognition Location and distance at which identification must be read Level and color of illumination Criticality of the function identified Label design and identifying information used within and between systems For additional information visit

44 Maintainability Minimize Troubleshooting Time Design
Comprehensive monitoring Procedures Standardization Test and service points For additional information visit

45 Maintainability Monitoring Design Monitoring capabilities
Event notification Event reconstruction Event mitigation Determine maintenance frequencies Allow for accurate and efficient communication of events For additional information visit

46 Maintainability Monitoring Design
What type of monitoring system do I need? No monitoring Not recommended for any mission critical facility Remote Alarm Status Panel (RASP) No trending or time stamping Gives visual and auditable notification Usually for one device or system Monitoring with dry contacts Limited number of points Limited time stamping Status is either on or off Serial interfaces Comprehensive data Data points with values rather than on/off Flexible and expandable For additional information visit

47 Maintainability Procedures Design Emergency Operating Procedures (EOP)
Developed for failure modes Readily available for use – locate at point-of-service Should be developed and tested during the commissioning phase Detailed – switch level Update any changes discovered Method Operating Procedure (MOP) Developed for all operations Have back-out procedures included Use with pilot/copilot approach For additional information visit

48 Maintainability Procedures Design Trouble-shooting procedures
Trouble-shooting flow charts Restoration procedures Maintenance procedures Detailed procedures Include measure points for future trending Used and completed during maintenance For additional information visit

49 Maintainability Procedures Design Common procedures error traps
In-field decisions Vague instructions Undefined or uncommon terms Burdensome or complex instruction Multiple actions Inconsistent statements or actions Misleading or missing critical information Interfacing with external procedures Lack of ownership Lack of quality assurance review For additional information visit

50 Maintainability Standardization Design
Standardization ensures consistency and comparability of knowledge and parts Acronyms Reduce confusion Manufacturers Reduced spare part counts Familiarization with operations and maintenance Layouts Increase ease-of-use Labeling For additional information visit

51 Maintainability Test and Service Points Design
Test points provide a means for conveniently and safely determining the operational status of equipment and isolating malfunctions Test points, strategically placed, make signals available to the technician for checking, adjusting or troubleshooting Service points provide means for lubricating, filling, draining, charging and similar functions For additional information visit

52 Maintainability Test and Service Points Design
General principles for test and service points Avoiding need for frequent testing and service Standardization Test and service point compatibility Labeling dangerous test and service compatibility Distinctively different connectors and fittings Location of test, service and adjustment points For additional information visit

53 Maintainability Equipment
Ordering the right accessories with your equipment can make a big difference when it comes to the maintainability of your equipment When ordering equipment or reviewing design documents, solicit input from your operations and maintenance staff involved It’s much cheaper to order it right the first time, than to upgrade it later in the field For additional information visit

54 Maintainability Generators Equipment Water separators for fuel
Radiator water level Isolation valves on water jacket heaters Generator-mounted circuit breakers Battery cables Battery monitoring Fuel-level monitor For additional information visit

55 Maintainability Switchgear Equipment Annual infrared thermal scanning
Protective relays Breaker testing PLC Code Hard copy Up-loadable copy Beware of small UPS systems Station batteries Internal cleaning Mimic bus For additional information visit

56 Maintainability Automatic Transfer Switches Equipment
Maintenance bypass Order it with a maintenance bypass or design the system to have a manually operated breaker bypass to wrap around the ATS to both sources For additional information visit

57 Maintainability UPS Systems Equipment AC filter capacitors
3-5 years DC filter capacitors Transfer circuits Capture the transfer between UPS and bypass Procedures Detail PM procedures Capture before and after readings Calibration/maintenance Capture details Don’t just do a “dust and clean” PM For additional information visit

58 Maintainability Batteries Equipment VLA (flooded) VRLA (sealed)
Vented lead acid Quarterly maintenance VRLA (sealed) Valve-regulated lead acid Semi-annual maintenance Float voltage Room temperature Proper maintenance Water as required Battery monitoring Batteries found UPS systems Generators Switchgear PLCs and breakers Telecom equipment For additional information visit

59 Maintainability PDU’s Equipment Shutdown alarms EPO circuits
Identify and understand them EPO circuits If used, is it maintainable? Monitoring Main Sub-panels Branch circuit breakers Snap-in vs. bolt-in breakers Use bolt-in breakers only Transformers K-rated For additional information visit

60 Maintainability Load Banks Equipment Permanently installed load banks
Generator testing Annual load test Troubleshooting UPS system testing Paralleling gear Set-up and calibration For additional information visit

61 Maintainability Water Source Equipment
Alternate water source needs to be capable of supplying water, so that the primary water source can be removed for maintenance Usage metering should be on each water source Types of alternate water source City water Wells Storage tanks For additional information visit

62 Maintainability Pumps Equipment Alignment Bearings
Will reduce wear and tear on shafts, bearings and seals Reduce vibration Decrease current draw Bearings Accessible grease fittings Grease as required Infrared thermal scanning Motor problems Alignment issues For additional information visit

63 Maintainability CRAH/CRAC Equipment
Temperature and humidity set points Should be set the same Humidifiers Have replacements for bulbs and canisters Filters Use a pre-filter in dirty locations Make sure your dirty filter Differential Pressure (DP) switch is set correctly Alignment Proper alignment will reduce wear on the shaft and bearings Bearings Grease when required Infrared thermal heat scan Refrigerant leaks can activate fire alarms For additional information visit

64 Maintainability Staff Dispatched service
Verify your vendors qualifications as a company Request resumes of the people performing work at your site Review their technical aptitude Verify your vendors training programs Onsite operation and maintenance staff Verify that they are managed correctly (in-house or contracted) Verify your staff’s resumes and qualifications Verify training programs For additional information visit

65 Maintainability Location
Location and access of valuable resources is important when situations arise 3:00 am Sunday morning is not the time to try to locate fuses required to get your site up and running There are various resources you should consider before the need arises; Equipment Technicians Parts Procedures Manuals Drawings For additional information visit

66 Maintainability Training
It is important that your operation and maintenance staff is adequate and regularly trained When an emergency occurs they should have the confidence and experience to complete the task at hand Available training methods; Self paced Classroom Web based Manufacturer’s training On-the-job training Procedure development Training module development Test beds Simulators For additional information visit

67 Maintainability Coordination
Work activities – it is important to closely coordinate maintenance activities, to maintain a reliable, efficient and safe working environment During outage windows we have the tendency to plan too many activities at once. Make sure you don’t have too many people working in the same space at once For additional information visit

68 Maintainability Coordination
Pay particular attention to planning of your maintenance activities CRAC units – refrigerant leaks will activate the fire systems; make sure you disable the fire system* prior to charging a system Under floor cleaning – can activate the fire alarm system; make sure you deactivate the fire alarm system* before you start to clean under the floor There are other maintenance activities and tests that could mistakenly set-off the fire alarm system *When you disable a fire alarm system, make sure you follow the required procedures by OSHS, NFPA, local authorities, your company and your insurance underwriter. This could include, but is not limited to; additional fire extinguishers, posting fire watch, notification, special procedures, and tagging For additional information visit

69 Maintainability Coordination Maintenance activities
If you are planning to transfer your UPS to a generator maintenance bypass to perform maintenance on the UPS, PM the generator first If you are planning to perform an open transfer to the building electrical system, inspect your UPS batteries first Be aware of maintenance activities of building-wide systems that can effect the data center’s Chillers Pumps Electrical service For additional information visit

70 Maintainability Maintenance Windows Maintenance windows
Downtime vs. reduced reliability Reduction in reliability Design system to have various maintenance capabilities Move away from critical loads and towards utility “Make sure you plan your maintenance windows carefully between IT and Facilities.” For additional information visit

71 Maintainability Maintenance Windows
IT maintenance windows are often loaded with IT tasks and therefore are not completely available for facilities tasks Need to clearly define the true window for facility maintenance Maintenance window is midnight to 6 am IT takes an hour to shut down and an hour to start-up Real outage is limited to 1 am to 5 am For additional information visit

72 PREDICTABILITY Predictability is the ability to detect the onset of a failed system before it happens Predictive analysis can be performed by: Reviewing PM data Conducting failure analysis Monitoring systems Trending Advance diagnostics

73 Predictability Reviewing PM data
PM should not only be a time to complete preventative maintenance tasks, but also be used as a diagnostic tool Use detailed PM guides and complete them so they can be reviewed later Review your PM task list and add additional items that can be used to perform predictive analysis Record before and after data. This is important to set baselines and conduct trending For additional information visit

74 Predictability Conducting failure analysis Event occurs
Complete an incident report Incident report should only contain facts of what happened during the event Stabilize the system Repair the system Take accurate and specific notes Take before and after readings Document For additional information visit

75 Predictability Conduct root cause analysis Recommendations
It is not necessary to prevent the first, or root cause from happening It is merely necessary to break the chain of events at any point and thus final failure cannot occur Recommendations Make recommendation to prevent future failures Implement those changes in the failed system and other similar systems When the fault leads to an initial design problem, redesign is necessary Where the fault leads back to equipment failure, develop ways to improve the component wear, quality and life Where the fault leads back to a failure of procedures, it is necessary to either address the procedural weakness or to install a method to protect against the damage caused by the procedural failure For additional information visit

76 Predictability Monitoring systems Install a monitoring system
Monitor as much as you can, as long as you do something with the points you select Know what you are monitoring and what effects the points Develop your point list to assist you in predictive analysis Comprehensive monitoring systems will provide you with the best information For additional information visit

77 Predictability Trending
Once your monitoring system is installed, select key points to trend Use your trends to develop replacement and PM intervals Items you can trend: Temperatures Pressure Flow rates Usage Time Consumption Load For additional information visit

78 Predictability Advance diagnostic techniques Infrared thermal imaging
Oil analysis Coolant analysis Fuel analysis Ultrasonic analysis Power quality testing Battery impedance testing Vibration testing Motor analysis Eddy current analysis Laser alignment Balancing For additional information visit

79 Predictability Uses for an IR camera Belt tension Pump alignment
Bearings Electrical connections Turbo chargers Roof leaks Poor insulation Room seals For additional information visit

80 Unless you are the Predator you will need to use an IR Camera
Predictability Unless you are the Predator you will need to use an IR Camera Infrared thermography Is the process of developing visual images that represent variations in the IR spectrum Any object that is above absolute zero omits IR energy IR spectrum is between 2.0 and 15 microns IR spectrum falls outside the range of the human eye IR cameras detect the temperature changes that can potentially mean the presence of conditions or stressors that act to decrease the life of the equipment design The IR camera can have many uses in a data center For additional information visit

81 Predictability Overloaded Breaker Fuse Connection Loose Cable
Defective Breaker For additional information visit

82 Predictability Pump Alignment Water Under Roof Tank Level
Missing Insulation For additional information visit

83 Predictability Oil analysis
Oil analysis is used to define three basic machine conditions Condition of the oil can determine lubricate viscosity, acidity , etc. Lubrication system condition: Have physical boundaries been violated? i.e. fuel in oil Machine condition by looking for wear particulars For additional information visit

84 Predictability Oil analysis
Oil condition is most easily determined by measuring the viscosity, acid number and base number Additional tests can determine the presence and/or effectiveness of oil additives such as anti-wear addictiveness, antioxidants, corrosion inhibitors, and anti-foam agents Component wear can be determined by measuring the amount of wear metals such as iron, copper, chromium, aluminum, lead, tin and nickel, and can identify when a particular part is wearing Contamination is determined by measuring water content, specific gravity, and the level of silicon. Change in specific gravity typically indicates presence of other oil or fuel contamination For additional information visit

85 Predictability Metals Engines Gears Iron Chrome Aluminum Nickel Copper
Cylinder heads, rings, gears, crankshafts Gears, bearings Chrome Rings, liners, exhaust valves Roller bearings Aluminum Pistons, thrust bearings, turbo bearings, main bearings Pump, thrust washers Nickel Valve plating, steel alloy from crankshaft, camshafts Steel alloy from roller bearings Copper Lube coolers, main and rod bearings, bushings, turbo bearings Brushings, thrust plates Lead Main and rod bearings, bushings, lead solder Bushings, grease contamination Tin Piston flashing, bearing overlays, bronze alloy Bearing cage metal Silver Wrist pin bushings, silver solder from lube coolers Silver solder from lube coolers Titanium Gas turbine bearings. Hubs, turbine blades N/A For additional information visit

86 Predictability Coolant analysis
Regular coolant testing and routine maintenance can help you achieve maximum system efficiency and save you time and money in less downtime A cooling system is subject to pitting, corrosion, cavitations, erosion and electrolysis Although coolants are formulated to help prevent these problems from occurring, coolant analysis will identify if they are present and determine if the coolant you're using is providing adequate protection For additional information visit

87 Predictability Fuel analysis
Fuel analysis can point to solutions for filter plugging, loss of power or poor injector performance Testing bulk fuel storage tanks can verify compliance with required supplier specifications For additional information visit

88 Predictability Ultrasonic inspection
Ultrasonic or ultrasound are sound waves above 20kHz to 100kHz that can not be heard by humans Unlike IR, ultrasound travels a short distance from the source Ultrasonic detectors can be used to detect component wear, fluid leaks, vacuum leaks and steam trap failures Even though such a leak may not be audible to the human ear, ultrasound will still be detectable with the appropriate tool For additional information visit

89 Predictability Pressure and vacuum leaks can occur in various locations Compressed air Heat exchangers Boilers Condensers Tanks Pipes Valves Steam traps Ultrasonic inspections can detect these small leaks For additional information visit

90 Predictability Mechanical systems suffer from wear through constant operation, and ultrasonic inspection can detect wear in these systems Mechanical applications Bearings Lack of lubrication Pumps Motors Gear/gearboxes Fans Compressors For additional information visit

91 Predictability Mechanical devices are not the only devices that omit ultrasonic sound. Electrical equipment will also generate ultrasonic waves if arching, tracking or corona are present Electrical applications Arching, tracking and corona Switchgear Transformer Insulators Circuit breakers For additional information visit

92 Predictability Power quality testing
Hardware and software are frequently blamed for all types of problems that may actually originate from within your building’s electrical distribution system; poor power quality In many cases, the number one indication that you have a power quality problem is intermittent, unexplained technology equipment or process failures Responding service technicians may complete a work report with the words “no trouble found" For additional information visit

93 Predictability Impedance testing
A substitute to performing a full load test The internal resistance of a cell can be determined by how that cell responds to a momentary load The instantaneous voltage drop and load current applied are used to calculate the resistance Most cell testers can check the impedance with the battery online or offline For additional information visit

94 Predictability Vibration analysis
The level and frequency of the vibration of rotating machinery are not distinguishable to the human touch Can be used to discover and diagnose a wide range of problems related to rotating equipment For additional information visit

95 Predictability Vibration monitoring can detect;
Unbalance Eccentric rotors Misalignment Mechanical looseness or weakness Types of systems that vibration analysis should be performed on; Generators Cooling tower fans Chillers Pumps CRAH/CRAC Air handlers For additional information visit

96 Predictability Tests used to perform motor analysis
Infrared Vibration analysis Surge comparison Motor current signature comparison Motor faults or conditions can be detected Winding short circuits Open coils Improper torque settings As well as other mechanical problems For additional information visit

97 Predictability Types of motor analysis
Surge comparison testing identifies insulation deterioration by applying a high frequency transient surge to equal parts of a winding, and by comparing the resulting voltage waveform Motor Current Signature Analysis (MCSA) provides a non-intrusive method of detecting mechanical and electrical problems For additional information visit

98 Predictability Eddy current analysis
Detects surface and subsurface defects Detects variations in alloy, heat treatments, hardness, structure and other physical metallurgical conditions Should be done on chillers each year when the tubes are being cleaned For additional information visit

99 Predictability Alignment inspection
Shafts and pumps should have the proper alignment, and is best accomplished by using laser alignment When machines are improperly aligned there are added loads to the bearings and couplings which can result in early and unplanned failures For additional information visit

100 Predictability Balance
Reduce wear and tear on bearings, shafts and motors Can be detected with the use of infrared cameras and vibration meters Requires balancing equipment to verify and correct balancing For additional information visit

101 SCALABILITY Scalability is a desirable property of a system which indicates its ability to either handle growing amounts of work in a graceful manner, or to be readily enlarged without impact to operations For example, it can refer to the capability of a system to increase total throughput under an increased load when resources (typically hardware) are added

102 Scalability What do we want… a flexible, scalable, reliable, highly performing, and highly available computer infrastructure that adapts to a wide range of continuously evolving and challenging demands For additional information visit

103 Scalability What does it take? Requirements analysis
Basis of Design (BOD) Design Modular approach Avoid excessive equipment Pay as you go Expansion techniques For additional information visit

104 Scalability Good planning and decisions are the foundation of a highly scalable facility At no point in the lifecycle of a mission-critical facility can you have greater impact on scalability then during the design phase Start with a Requirements Analysis (RA) of your data center needs Use the results of your RA to develop a Basis of Design (BOD) The RA and BOD are living documents and you need to update them as changes occur For additional information visit

105 Scalability Requirements Analysis Requirements analysis
Growth modeling takes the hardware platform requirements and turns them into space, power and cooling requirements Considers both current and future technology impacts on space, power and cooling Typically done for 3+ year planning This leads to the critical infrastructure’s BOD For additional information visit

106 Scalability Basis of Design
Roadmap to a reliable and quality-designed site More often then not, the BOD is lacking in detail Define the requirements of the site Defines the reliability, availability, maintainability, scalability and operational parameters Should be updated regularly For additional information visit

107 Scalability Designing with scalability in mind Scalability
Reduced initial cost Reduced time to install equipment Reduces the requirements of purchasing large systems Not an advantage for fast-growing facilities Modular design can be more precisely matched to reflect; Lower capital investment “Pay as you go approach” Budget/capital constraints Controlled growth Unanticipated growth For additional information visit

108 Scalability Equipment rooms
When possible, design equipment rooms with space for expansion Design hallways, corridors and doors to allow access for new equipment Conserve wall space for future panels and equipment For additional information visit

109 Scalability Switchgear Expansion breakers Expansion cells
Be aware of bussing configuration, use fully-rated bus throughout Use larger frame breakers with adjustable trips Have expansion in your Programmable Logic Controller (PLC) Have access to programming codes Have current backup For additional information visit

110 Scalability UPS systems Remember
Size parallel cabinet and static switch for full build-out If modules are upgradeable, size feeders to full build-out If equipped with sync control cabinet, size for full build- out Remember When you start to add more then 3 modules in parallel, the redundancy begins to drop For additional information visit

111 Scalability Critical distribution Dual main input Spare breakers
Allow for the possibility of a second source to supply load during cutover or expansion activities Could be used to connect temporary equipment for emergencies Load bank testing Spare breakers Allow for additional PDU and expected new load Up-frame the breaker so that larger loads may be added i.e. use 400A frame breakers with 225A rating plugs to power PDUs For additional information visit

112 Scalability Power Distribution Units (PDUs)
Typically you run out of circuits before capacity Install junction box below floor to allow for additional power whips. Bottom plates usually do not have enough knock-out Order PDU’s with additional 225A sub-fed breakers to support additional Remote Power Panel (RPP) Consider in-row PDU’s to save space For additional information visit

113 Scalability EPO systems
Plan on the fact that the EPO system will have items added and removed from it EPO should be an engineered device and not a cloud stating ”by others” System should be documented Should have an Active, Test and Off mode of operation Installed with isolation relays Centrally located in an EPO control cabinet with room for expansion For additional information visit

114 Scalability Chilled water systems When possible, up-size piping
Have additional valves installed under the floor so you can add CRAH units as needed Have valves installed for additional pumps and chillers Have a valve connection that can be easily hooked-up to a temporary chiller For additional information visit

115 Scalability Monitoring systems Make sure that the system is expandable
Some systems are not up-gradable, while others require adding another module to the communication trunk Make sure you will not be locked in with an uncooperative manufacturer Have access to the programming function and required passwords For additional information visit

116 Scalability Expansion techniques
Implementation of new systems while the facility is in “production” is a business reality The need for hot cutover occurs more often. For safety reasons, hot cutover should be a last resort With proper upfront planning, the need for hot taps and cutovers can be reduced or eliminated For additional information visit

117 UPTIME Uptime (Ŷ) is a measure of the time a system has been "up“, running and available. It came into use to describe the opposite of downtime, times when a system was not operational ρ = Reliability ά = Availability ц = Maintainability ∏ = Predictability ∑ = Scalability

Reliability (ρ) is the ability of a system to perform and maintain its functions in routine circumstances, as well as hostile or unexpected circumstances SCALABILITY PREDICTABILITY MAINTAINABILITY AVAILABILITY RELIABILITY

Availability (ά) is the ability of a system to tolerate failures Refers to the time that a system is available to its users This means the process continues to be served through the failure and that, ideally, the failure is transparent to the user SCALABILITY PREDICTABILITY MAINTAINABILITY AVAILABILITY RELIABILITY

Maintainability (ц) is defined as the probability of performing a successful repair action or preventative maintenance within a given time In other words, maintainability measures the ease and speed with which a system can be restored to operational status SCALABILITY PREDICTABILITY MAINTAINABILITY AVAILABILITY RELIABILITY

Predictability (∏) is the ability to detect the onset of a failed system before it happens Predictive analysis can be performed by: Reviewing PM data Conducting failure analysis Monitoring systems Trending Advance diagnostics SCALABILITY PREDICTABILITY MAINTAINABILITY AVAILABILITY RELIABILITY

Scalability (∑) is a desirable property of a system which indicates its ability to either handle growing amounts of work in a graceful manner, or to be readily enlarged For example, it can refer to the capability of a system to increase total throughput under an increased load when resources (typically hardware) are added SCALABILITY PREDICTABILITY MAINTAINABILITY AVAILABILITY RELIABILITY

123 UPTIME ρ * ά *ц * ∏ * ∑ = Ŷ

Be sure to look at more than just the design of your facility… don’t miss a step. Use RAMPS to achieve maximum uptime! SCALABILITY PREDICTABILITY MAINTAINABILITY AVAILABILITY RELIABILITY

125 Presented by Joe Soroka
RAMPS© Reliability, Availability, Maintainability, Predictability, Scalability Presented by Joe Soroka For additional information visit

Download ppt "Presented by Joe Soroka"

Similar presentations

Ads by Google