Reliability-Centered Maintenance (RCM)

Introduction

Within This Page

Introduction
Description
Relevant Codes and Standards
Additional Resources

With a few exceptions, preventive maintenance has been considered the most advanced and effective maintenance technique available for use by industrial and facility maintenance organizations. A Preventive Maintenance (PM) program is based on the assumption of a "fundamental cause-and-effect relationship between scheduled maintenance and operating reliability. This assumption was based on the intuitive belief that because mechanical parts wear out, the reliability of any equipment [is] directly related to operating age. It therefore followed that the more frequently equipment was overhauled, the better protected it was against the likelihood of failure. The only problem was in determining what age limit was necessary to assure reliable operation."

Nowlan and Heap reached the conclusion that, "a maintenance policy based exclusively on some maximum operating age would, no matter what the age limit, have little or no effect on the failure rate."

In separate independent studies, it was noted that a difference existed between the perceived and the intrinsic design life for the majority of equipment and components. In fact, it was discovered that in many cases equipment greatly exceeded the perceived or stated design life.

Reliability-Centered Maintenance (RCM) is the optimum mix of reactive, time- or interval-based, condition-based, and proactive maintenance practices. The basic application of each strategy is shown in Figure 1. These principal maintenance strategies, rather than being applied independently, are integrated to take advantage of their respective strengths in order to maximize facility and equipment reliability while minimizing life-cycle costs.

Components of an RCM program. The first component is Reactive. The reactive component's attributes are: small items, non-critical, inconsequential, unlikely to fail, and redundant. The second component is Interval (Preventative Maintenance). Interval's attributes are: subject to wear-out, consumable replacement, failure pattern known. Condition based Maintenance is the third component. Its attributes are: random failure patterns, not subject to wear, and PM induced failures. Proactive is the fourth component. Its attributes are: Root Cause Failure Analysis, age exploration, Failure modes and Effects Analysis, and acceptance testing.

Figure 1. Components of an RCM Program.

RCM includes reactive, time-based, condition-based, and proactive tasks. In addition, a user should understand system boundaries and facility envelopes, system/equipment functions, functional failures, and failure modes, all of which are critical components of the RCM program.

Description

Preventive Maintenance (PM) assumes that failure probabilities can be determined statistically for individual machines and components, and parts can be replaced or adjustments can be performed in time to preclude failure. For example, a common practice has been to replace or renew bearings after so many operating hours assuming that bearing failure rate increases with time in service.

Figure 2, Bearing Life Scatter, shows the failure distribution of a group of thirty identical 6309 deep groove ball bearings installed on bearing life test machines and run to failure. The wide variation in bearing life is obvious and precludes the use of any effective time-based maintenance strategy.

Bearing life scatter bar graph. Graph shows bearing numbers (1 to 30) by revolutions (0 to 300 millions of revs). Graph shows the results to vary wildly.

Figure 2. Bearing Life Scatter Credit: Ball and Roller Bearings: Theory, Design and Application, 3rd Edition Eschmann, et al John Wiley & Sons. 1999

Fortunately, computer advances in the 1990s have made it possible in many cases to identify the precursors of failure, quantify equipment condition, and schedule the appropriate repair with a higher degree of confidence than was possible when performing strictly interval-based maintenance relying upon usually erroneous estimates of when a component might fail. Also, it has been discovered recently that there are many different equipment failure characteristics, only a small number of which are age- or use-related. This new knowledge has increased the emphasis on Condition Monitoring (CM), often referred to as Condition-Based Maintenance, which has caused a reduction in the reliance upon time-based PM.

It should not be inferred from the above that all interval-based maintenance should be replaced by condition-based maintenance. In fact, interval-based maintenance is appropriate for those instances where abrasive, erosive, or corrosive wear takes place, material properties change due to fatigue, embrittlement, etc. and/or a clear correlation between age and functional reliability exists.

In addition, for those systems or components where no failure consequences in terms of mission, environment, safety, security, or Life-Cycle Cost (LCC) exist, maintenance should not be performed, i.e., the equipment should be run to failure and replaced.

The concept of RCM has been adopted across several government and industry operations as a strategy for performing maintenance. RCM applies maintenance strategies based on consequence and cost of failure. In addition, RCM seeks to minimize maintenance and improve reliability throughout the life-cycle by using proactive techniques such as improved design specifications, integration of condition monitoring in the commissioning process, and the Age Exploration (AE) process.

A. RCM Principles

The primary RCM principles are:

RCM is Function Oriented—RCM seeks to preserve system or equipment function, not just operability for operability's sake. Redundancy of function, through multiple pieces of equipment, improves functional reliability but increases life-cycle cost in terms of procurement and operating costs.
RCM is System Focused—RCM is more concerned with maintaining system function than with individual component function.
RCM is Reliability Centered—RCM treats failure statistics in an actuarial manner. The relationship between operating age and the failures experienced is important. RCM is not overly concerned with simple failure rate; it seeks to know the conditional probability of failure at specific ages (the probability that failure will occur in each given operating age bracket).
RCM Acknowledges Design Limitations—RCM objective is to maintain the inherent reliability of the equipment design, recognizing that changes in inherent reliability are the province of design rather than of maintenance. Maintenance can, at best, only achieve and maintain the level of reliability for equipment that was provided for by design. However, RCM recognizes that maintenance feedback can improve on the original design. In addition, RCM recognizes that a difference often exists between the perceived design life and the intrinsic or actual design life and addresses this through the Age Exploration (AE) process.
RCM is Driven by Safety, Security, and Economics—Safety and security must be ensured at any cost; thereafter, cost-effectiveness becomes the criterion.
RCM Defines Failure as "Any Unsatisfactory Condition"—Therefore, failure may be either a loss of function (operation ceases) or a loss of acceptable quality (operation continues but impacts quality).
RCM Uses a Logic Tree to Screen Maintenance Tasks—This provides a consistent approach to the maintenance of all kinds of equipment.
RCM Tasks Must Be Applicable—The tasks must address the failure mode and consider the failure mode characteristics.
RCM Tasks Must Be Effective—The tasks must reduce the probability of failure and be cost-effective.
RCM Acknowledges Three Types of Maintenance Tasks—These tasks are time-directed (PM), condition-directed (CM), and failure finding (one of several aspects of Proactive Maintenance). Time-directed tasks are scheduled when appropriate. Condition-directed tasks are performed when conditions indicate they are needed. Failure-finding tasks detect hidden functions that have failed without giving evidence of pending failure. Additionally, performing no maintenance, Run-to-Failure, is a conscious decision and is acceptable for some equipment.
RCM is a Living System—RCM gathers data from the results achieved and feeds this data back to improve design and future maintenance. This feedback is an important part of the Proactive Maintenance element of the RCM program.

B. Types of RCM

There are several ways to conduct and implement an RCM program. The program can be based on rigorous Failure Modes and Effects Analysis (FMEA), complete with mathematically-calculated probabilities of failure based on design or historical data, intuition or common-sense, and/or experimental data and modeling. These approaches may be called Classical, Rigorous, Intuitive, Streamlined, or Abbreviated. Other terms sometimes used for these same approaches include Concise, Preventive Maintenance (PM) Optimization, Reliability Based, and Reliability Enhanced. All are applicable. The decision of what technique to use should be left to the end user and be based on:

Consequences of failure
Probability of failure
Historical data available
Risk tolerance
Resource availability

Classical/Rigorous RCM

Benefits: Classical or rigorous RCM provides the most knowledge and data concerning system functions, failure modes, and maintenance actions addressing functional failures of any of the RCM approaches. Rigorous RCM analysis is the method first proposed and documented by Nowlan and Heap and later modified by John Moubray, Anthony M. Smith, and others. In addition, this method should produce the most complete documentation of all the methods addressed here.
Concerns: Classical or rigorous RCM historically has been based primarily on the FMEA with little, if any, analysis of historical performance data. In addition, rigorous RCM analysis is extremely labor intensive and often postpones the implementation of obvious condition monitoring tasks.
Applications: This approach should be limited to the following three situations:
- The consequences of failure result in catastrophic risk in terms of environment, health, or safety, and/or complete economic failure of the business unit.
- The resultant reliability and associated maintenance cost is still unacceptable after performing and implementing a streamlined type FMEA.
- The system/equipment is new to the organization and insufficient corporate maintenance and operational knowledge exists on function and functional failures.

Abbreviated/Intuitive/Streamlined RCM

Benefits: The intuitive approach identifies and implements the obvious, usually condition-based, tasks with minimal analysis. In addition, it culls or eliminates low value maintenance tasks based on historical data and Maintenance and Operations (M&O) personnel input. The intent is to minimize the initial analysis time in order to realize early-wins that help offset the cost of the FMEA and condition monitoring capabilities development.
Concerns: Reliance on historical records and personnel knowledge can introduce errors into the process that may lead to missing hidden failures where a low probability of occurrence exists. In addition, the intuitive process requires that at least one individual has a thorough understanding of the various condition monitoring technologies.
Applications: This approach should be utilized when:
- The function of the system/equipment is well understood.
- Functional failure of the system/equipment will not result in loss of life or catastrophic impact on the environment or business unit.
- For these reasons, the streamlined or intuitive approach has been recommended for DOS, NASA, and NAVFAC facilities. In addition, a streamlined or intuitive approach has been successfully used in both discrete and continuous manufacturing facilities.

C. RCM Analysis

The RCM analysis should carefully consider and answer the following questions:

What does the system or equipment do; what are the functions?
What functional failures are likely to occur?
What are the likely consequences of these functional failures?
What can be done to reduce the probability of the failure(s), identify the onset of failure(s), or reduce the consequences of the failure(s)?

Answers to these four questions can be used with the decision logic tree depicted in Figure 3, Reliability-Centered Maintenance (RCM) Decision Logic Tree, to determine the maintenance approach for the equipment item or system.

RCM logic tree. Question #1: Will the failure have a direct and adverse effect on environment, health, security, safety? If you answered no to Question #1 - Question #2 Will the failure have a direct effect and adverse effect on Mission (quantity or quality)? If you answered yes to Question #1 - Question #3 Is there an effective CM technology or approach? If you answered no to Question #2 - Question #4 Will the failure result in other economic loss (high cost damage to machines or system? If you answered yes to Question #2 - see Question #3. If you answered no to Question #4 - Question #5 Candidate for Run-to-fail? If you answered yes to Question #4 - see Question #3. If you answered no to Question #3 - Question #6 Is there an effective interval-based task? If you answered yes to Question #3 - Develop & schedule CM task to monitor condition - perform condition-based task. If you answered no to Question #6 - Redesign system, accept the failure risk, or install redundancy. If you answered yes to Question #6 - Develop & schedule interval-based task.

Figure 3. Reliability Centered Maintenance (RCM) Logic Tree

Note that the analysis process as depicted in Figure 3 has only four possible outcomes:

Perform Condition-Based actions (CM).
Perform Interval (Time- or Cycle-) Based actions (PM).
Determine that redesign will solve the problem and accept the failure risk, or determine that no maintenance action will reduce the probability of failure install redundancy.
Perform no action and choose to repair following failure (Run-to-Failure).

D. Failure

Failure is the cessation of proper function or performance. RCM examines failure at several levels: the system level, sub-system level, component level, and sometimes even the parts level. The goal of an effective maintenance organization is to provide the required system performance at the lowest cost. This means that the maintenance approach must be based on a clear understanding of failure at each of the system levels. System components can be degraded or even failed and still not cause a system failure. A simple example is the failed headlamp on an automobile. That failed component has little effect on the overall system performance. Conversely, several degraded components may combine to cause the system to have failed, even though no individual component has itself failed.

System and System Boundary

A system is any user-defined group of components, equipment, or facilities that support an operational requirement. These operational requirements are defined by mission criticality or by environmental, health, safety, regulatory, quality, or other agency/business defined requirements. Most systems can be divided into unique sub-systems along user-defined boundaries. The boundaries are selected as a method of dividing a system into subsystems when its complexity makes an analysis by other means difficult:

A system boundary or interface definition contains a description of the inputs and outputs that cross each boundary.
The facility envelope is the physical barrier created by a building, enclosure, or other structure; e.g., a cooling tower or tank.
Standardize on selecting boundaries. For example, a pump could include the first upstream/downstream isolation valve, the coupling, and associated gauges. The motor would include the electrical circuit from the load side of the motor control center but not the coupling.

The intent is to develop a series of modular FMEAs and assemble them as if they were Lego® Blocks and select the maintenance actions based on the consequences of risk determined by the criticality and probability factors defined in Tables 1 and 2 respectively.

Function and Functional Failure

The function defines the performance expectation and can have many elements. Elements include physical properties, operation performance including output tolerances, and time requirements such as continuous operation or limited required availability.

Functional failures are descriptions of the various ways in which a system or subsystem can fail to meet the functional requirements designed into the equipment. A system or subsystem that is operating in a degraded state but does not impact any of the requirements addressed in System and System Boundary, has not experienced a functional failure.

It is important to determine all of the functions of an item that are significant in a given operational context. By clearly defining the functions' non-performance, the functional failure becomes clearly defined. For example, it is not enough to define the function of a pump to move water. The function of the pump must be specific and defined in such terms flow rate, discharge pressure, vibration levels, B₁₀ (L₁₀) Life efficiency, etc. (Reliability HotWire)

Failure Modes

Failure modes are equipment- and component-specific failures that result in the functional failure of the system or subsystem. For example, a machinery train composed of a motor and pump can fail catastrophically due to the complete failure of the windings, bearings, shaft, impeller, controller, or seals. In addition, a functional failure also occurs if the pump performance degrades such that there is insufficient discharge pressure or flow to meet operating requirements. These operational requirements should be considered when developing maintenance tasks.

Dominant failure modes are those failure modes responsible for a significant proportion of all the failures of the item. They are the most common modes of failure.

Not all failure modes or causes warrant preventive or conditioned based maintenance because the likelihood of their occurring is remote or their effect is inconsequential.

Reliability

Reliability is the probability that an item will survive a given operating period, under specified operating conditions, without failure usually expressed as B₁₀ (L₁₀) Life and/or Mean Time to Failure (MTTF) or Mean Time Between Failure (MTBF). The conditional probability of failure measures the probability that an item entering a given age interval will fail during that interval. If the conditional probability of failure increases with age, the item shows wear-out characteristics. The conditional probability of failure reflects the overall adverse effect of age on reliability. It is not a measure of the change in an individual equipment item.

Failure rate or frequency plays a relatively minor role in maintenance programs because it is too simple a measure. Failure frequency is useful in making cost decisions and determining maintenance intervals, but it tells nothing about which maintenance tasks are appropriate or about the consequences of failure. A maintenance solution should be evaluated in terms of the safety, security, or economic consequences it is intended to prevent. A maintenance task must be applicable (i.e., prevent failures or ameliorate failure consequences) in order to be effective.

Failure Characteristics

Conditional probability of failure (P_cond) curves fall into six basic types, as graphed (P_cond versus Time) in Figures 2-2 and 2-3, Random Conditional Probability of Failure Curves and Age Related Conditional Probability of Failure Curves. The percentage of equipment conforming to each of the six wear patterns as determined in three separate studies is also shown in both figures. (More)

The failure characteristics shown in Figures 4 and 5, Random Conditional Probability of Failure Curves, were first noted in the previously cited book, Reliability-Centered Maintenance. Follow-on studies in Sweden in 1973, and by the U.S. Navy in 1983, produced similar results. In these three studies, random failures accounted for 77–92% of the total failures and age related failure characteristics for the remaining 8–23%.

Random conditional probability of failure curves. First curve beginning at 0 and moving upward and then flattening out: UAL 1968 7%, BROMBERG 1973 11%, and U.S. NAVY 6%. Second curve remains a flat line: UAL 1968 14%, BROMBERG 1973 15%, and U.S. NAVY 42%. Third curve begins high and moves down, then flattens out: UAL 1968 68%, BROMBERG 1973 66%, and U.S. NAVY 29%.

Figure 4. Random conditional probability of failure curves

Random conditional probability of failure curves. First curve is shaped like a bowl: UAL 1968 4%, BROMBERG 1973 3%, and U.S. NAVY 3%. Second curve begins flat then curves upward: UAL 1968 2%, BROMBERG 1973 1%, and U.S. NAVY 17%. Third curve 30 ° diagonal: UAL 1968 5%, BROMBERG 1973 4%, and U.S. NAVY 3%.

Figure 5. Random conditional probability of failure curves

The basic difference between the failure patterns of complex and simple items has important implications for maintenance. Single-piece and simple items frequently demonstrate a direct relationship between reliability and age. This is particularly true where factors such as metal fatigue or mechanical wear are present or where the items are designed as consumables (short or predictable life spans). In these cases an age limit based on operating time or stress cycles may be effective in improving the overall reliability of the complex item of which they are a part.

Complex items frequently demonstrate some infant mortality, after which their failure probability increases gradually or remains constant. A marked wear-out age is not common. In many cases scheduled overhaul increases the overall failure rate by introducing a high infant mortality rate into an otherwise stable system.

Preventing Failure

Every equipment item has a characteristic that can be called resistance to or margin to failure. Using equipment subjects it to stress that can result in failure when the stress exceeds the resistance to failure. Figure 6, Preventing Failure, depicts this concept graphically. The figure shows that failures may be prevented or item life extended by:

Decreasing the amount of stress applied to the item. The life of the item is extended for the period f₀-f₁ by the stress reduction shown.
Increasing or restoring the item's resistance to failure. The life of the item is extended for the period f₁-f₂ by the resistance increase shown.
Decreasing the rate of degradation of the item's resistance to or margin to failure. The life of the item is extended for the period f₂-f₃ by the decreased rate of resistance degradation shown.

Figure 6. Preventing failure

Stress is dependent on use and may be highly variable. It may increase, decrease, or remain constant with use or time. A review of the failures of a large number of nominally identical simple items would disclose that the majority had about the same age at failure, subject to statistical variation, and that these failures occurred for the same reason. If one is considering preventive maintenance for some simple item and can find a way to measure its resistance to failure, he or she can use that information to help select a preventive task.

Adding excess material or changing the type of material that wears away or is consumed can increase resistance to failure or the rate of degradation. Excess strength may be provided to compensate for loss from corrosion or fatigue. The most common method of restoring resistance is by replacing the item. The resistance to failure of a simple item decreases with use or time (age), but a complex unit consists of hundreds of interacting simple items (parts) and has a considerable number of failure modes. In the complex case, the mechanisms of failure are the same, but they are operating on many simple component parts simultaneously and interactively so that failures no longer occur for the same reason at the same age. For these complex units, it is unlikely that one can design a maintenance task unless there are a few dominant or critical failure modes.

Failure Modes and Effects Analysis (FMEA)

FMEA is applied to each system, sub-system, and component identified in the boundary definition. For every function identified, there can be multiple failure modes. The FMEA addresses each system function (and, since failure is the loss of function, all possible failures) and the dominant failure modes associated with each failure, and then examines the consequences of the failure. What effect did the failure have on the mission or operation, the system, and on the machine?

Even though there are multiple failure modes, often the effects of the failure are the same or very similar in nature. That is, from a system function perspective, the outcome of any component failure may result in the system function being degraded.

Likewise, similar systems and machines will often have the same failure modes. However, the system use will determine the failure consequences. For example, the failure modes of a ball bearing will be the same regardless of the machine. However, the dominate failure mode will often change from machine to machine, the cause of the failure may change, and the effects of the failure will differ.

Figure 7, FMEA Worksheet, provides an example of a FMEA worksheet.

Figure 7. FMEA Worksheet

E. Criticality and Probability of Occurrence

Criticality assessment provides the means for quantifying how important a system function is relative to the identified Mission. Table 1, Criticality/Severity Categories, provides a method for ranking system criticality. This system, adapted from the automotive industry, provides 10 categories of Criticality/Severity. It is not the only method available. The categories can be expanded or contracted to produce a site-specific listing.

Table 1. Criticality/Severity Categories

Ranking	Effect	Comment
1	None	No reason to expect failure to have any effect on safety, health, environment, or mission.
2	Very Low	Minor disruption to facility function. Repair to failure can be accomplished during trouble call.
3	Low	Minor disruption to facility function. Repair to failure may be longer than trouble call but does not delay mission.
4	Low to Moderate	Moderate disruption to facility function. Some portion of mission may need to be reworked or process delayed.
5	Moderate	Moderate disruption to facility function. 100% of mission may need to be reworked or process delayed.
6	Moderate to High	Moderate disruption to facility function. Some portion of mission is lost. Moderate delay in restoring function.
7	High	High disruption to facility function. Some portion of mission is lost. Significant delay in restoring function.
8	Very High	High disruption to facility function. All of mission is lost. Significant delay in restoring function.
9	Hazard	Potential safety, health, or environmental issue. Failure will occur with warning.
10	Hazard	Potential safety, health, or environmental issue. Failure will occur without warning.

Credit: Reliability, Maintainability, and Supportability Guidebook, Third Edition, Society of Automotive Engineers, Inc., Warrendale, PA, 1995.

The Probability of Occurrence (of Failure) is also based on work in the automotive industry. Table 2, Probability of Occurrence Categories, provides one possible method of quantifying the probability of failure. If there is historical data available, it will provide a powerful tool in establishing the ranking. If the historical data is not available, a ranking may be estimated based on experience with similar systems in the facilities area. The statistical ("Effect") column in Table 2 can be based on operating hours, day, cycles, or other unit that provides a consistent measurement approach. The statistical bases ("Comment") may be adjusted to account for local conditions. For example, one organization changed the statistical approach for ranking 1 through 5 to better reflect the number of cycles of the system being analyzed.

Table 2. Probability of Occurrence Categories

Ranking	Effect	Comment
1	1/10,000	Remote probability of occurrence; unreasonable to expect failure to occur.
2	1/5,000	Low failure rate. Similar to past design that has, in the past, had low failure rates for given volume/loads.
3	1/2,000	Low failure rate. Similar to past design that has, in the past, had low failure rates for given volume/loads.
4	1/1,000	Occasional failure rate. Similar to past design that has, in the past, had similar failure rates for given volume/loads.
5	1/500	Moderate failure rate. Similar to past design that has, in the past, had moderate failure rates for given volume/loads.
6	1/200	Moderate to high failure rate. Similar to past design that has, in the past, had moderate failure rates for given volume/loads.
7	1/100	High failure rate. Similar to past design that has, in the past, had high failure rates that has caused problems.
8	1/50	High failure rate. Similar to past design that has, in the past, had high failure rates that has caused problems.
9	1/20	Very High failure rate. Almost certain to cause problems.
10	1/10+	Very High failure rate. Almost certain to cause problems.

Credit: Reliability, Maintainability, and Supportability Guidebook, Third Edition, Society of Automotive Engineers, Inc., Warrendale, PA, 1995.

F. RCM Implementation

There is no one set path for successfully implementing RCM because RCM is more than just performing a Failure Modes and Effects Analysis (FMEA), adopting condition monitoring techniques, and/or optimizing a maintenance and overhaul program through the application of an Age Exploration (AE) process. A successful RCM implementation process first must recognize what and where the source of return on investment (ROI) resides. The source(s) of ROI may be tangible and/or intangible. For the former, a quantifiable business case may be developed based on financial benefit (savings, cost avoidance, reduced Work in Progress (WIP) and/or reduced liability) to the organization while for the latter, the benefit may be unquantifiable (employee skills, morale, customer relations, etc.) In either case, a baseline and goal must be established through some mechanism such as internal or external benchmarking, which results in a defined gap between the "As-Is" and the "To-Be" state and the ROI identified for closing all or a portion of the gap.

Remember, caveat emptor. That is, RCM is not for everyone and very few organizations will benefit from implementing all elements of a classical RCM program. RCM like all tools/processes has an element of diminishing return. Not all the elements of RCM which are applicable to a nuclear power plant, the aircraft industry, and/or a 24/7 continuous process plant in a sold out condition, will be applicable to a batch process operation or a non-production facility. However, there are a few truths everyone should follow and there is no need to pilot or perform an FMEA analysis. They are:

Key performance indicators (aka metrics/performance indicators) are essential for establishing the baseline, goal, and the gap. Progress cannot be measured or sustained without KPIs. (See Section G-Key Performance Indicator (KPIs) Selection)
Thermography works for electrical distribution, boilers, couplings, roofing systems and building façades.
If your specifications for alignment, imbalance, motor circuit phase impedance, oil condition and cleanliness, and vibration are not quantified, the product you receive will have latent defects 80% of the time.
If you do not commission and check the sequence of operation of your equipment and buildings to a predetermined quantifiable specification, you will not get what you expect.
Pareto analysis is the best tool for determining where to start your RCM process. Look for the bottlenecks, the recurring failures, and follow the money.
RCM implementation in a team environment works better.
Failure modes for identical equipment are the same. It is only the consequence and probability of failure that changes.
The impact of poor water chemistry is underestimated in terms of energy consumption and life-cycle cost.
The majority of failures are random. Very few machines understand how a calendar works. Age Exploration can reveal hidden assets.
Celebrate and advertise your successes and address your failures. Credibility is a key to building support for long-term success.

G. Key Performance Indicators (KPIs) Selection

Significant thought must go into the process of selecting KPIs to support the maintenance program. The value of meaningful KPIs cannot be overstated; however the significance of KPIs that are inaccurate or inapplicable cannot be understated. First identify the goals and objectives of the organization because they will have an impact on the selection of KPIs at all levels of maintenance activity. KPIs that cannot possibly be obtained should not be chosen, and only those that may be controlled should be selected. Issues of concern should also be identified so that they will be considered in the selection of KPIs. All processes owners who are key to the implementation of the overall effort should have a self-selected metric to indicate goals and progress in meeting those goals. This will foster the acceptance of collecting data to support the KPIs and will also promote the use of the KPIs for continuous improvement. Also one must consider the capabilities of the organization to collect the data for KPIs, i.e. the process used for collecting and storing the data and the ease of extracting and reporting the KPIs. In doing this, the cost of obtaining data for the KPIs and the relative value they add to the overall program must be calculated. While advocating doing the right things within the maintenance program with life-cycle cost as a driver, the cost of the capturing supporting KPIs must also be watched closely.

1. Benchmark Selection

After selection of the appropriate KPIs is complete, benchmarks should be established. These characterize the organization's goals and/or progress points for using KPIs as a tool for maintenance optimization and continuous improvement. Benchmarks may be derived from the organizational goals and objectives or they may be selected from a survey performed with similar organizations. These benchmarks will be used as a target for growth and to evaluate risk associated with non-achievement of progress.

2. Utilization of KPIs

After benchmarks are established and data collection has begun, the information must be acted on in a timely manner to maintain continuity within all of the processes that are counting on KPIs as a performance enhancement tool. In order to take full advantage of the benefits of KPIs they should be displayed in public areas.

The tracking and publication of KPIs inform the people of what is important, what are the goals, and where they stand with respect to performance expectations. The impact of displaying KPIs often has an immediate effect on the workers in the functional area being measured. In addition, KPIs are an integral part of any Team Charter as they allow the Team and Management to determine Team priorities and measure productivity.

Relevant Codes and Standards

Annual Book of ASTM Standards: Section: 5 Petroleum Products, Lubricants, and Fossil Fuels
ASTM C1060 Standard Practice for Thermographic Inspection of Insulation Installations in Envelope Cavities of Frame Buildings
ASTM C1153 Standard Practice for the Location of Wet Insulation in Roofing Systems Using Infrared Imaging
ASTM E1186 Standard Practices for Air Leakage Site Detection in Building Envelopes and Air Barrier Systems
ASTM E1316 Standard Terminology for Nondestructive Examinations
Contamination Control for the Fluid Power Industry, Second Edition by E.C. Fitch and I.T. Hong. Silver Spring, MD: Pacific Scientific Company, 1990.
Effective Machinery Measurement using Dynamic Signal Analyzers, Application Note 243-1Download HP-DSA.pdf, Hewlett Packard, 1990.
The Fundamentals of Signal Analysis, Application Note 243Download an_243.pdf, Hewlett Packard, 1991.
Handbook of Lubrication and Tribology, several volumes, by Society of Tribologists and Lubrication Engineers (STLE).
ISO 3945 Mechanical Vibration of Large Rotating Machines with Speed Range from 10-200 rev/s-Measurement and Evaluation of Vibration Severity in Situ
ISO 6781 Thermal Insulation-Qualitative Detection of Thermal Irregularities in Building Envelopes-Infrared Method
Laser Alignment Specification for New and Rebuilt Machinery and Equipment, Specification A 1.0-1993, General Motors, 1993.
MIL-P-24534, Planned Maintenance System: Development of Maintenance Requirement Cards, Maintenance Index Pages, and Associated Documentation, U.S. Naval Sea Systems Command
MIL-STD 2173 (AS), Reliability-Centered Maintenance Requirements for Naval Aircraft, Weapons Systems and Support Equipment, U.S. Naval Air Systems Command
MIL-STD-2194 (SH), Infrared Thermal Imaging Survey Procedure for Electrical Equipment
NAVAIR 00-25-403, Guidelines for the Naval Aviation Reliability Centered Maintenance Process, U.S. Naval Air System Command
Reliability-Centered Maintenance Handbook, 59081-AB-GIB-010/MAINT, U.S. Naval Sea Systems Command
SAE JA1O11, Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes, SAE International
SAE JA1O12, A Guide to the Reliability-Centered Maintenance (RCM) Standard, SAE International
SNT-TC-1A Recommended Practice Number

Additional Resources

Federal Agencies

Organizations

Publications

Complete Building Equipment Maintenance Desk Book, Second Edition by Sheldon J. Fuchs. Englewood, NJ: Prentice-Hall, 1992.
Continuous Commissioning Guidebook for Federal Energy ManagersDownload FEMP%20Continuous%20Cx%20Guidebook.pdf DOE Federal Energy Management Program (FEMP), October 2002.
Dependability Management-Part 3-1 1: Application Guide-Reliability Centered Maintenance by International Electrotechnical Commission. Document No. 56/651/EDIS.
Maintainability: A Key to Effective Serviceability and Maintenance Management by B.S. Blanchard, D. Verma and E.L. Peterson. New York: John Wiley & Sons, Inc., 1995.
Maintenance Technology—The Source for Reliability Solutions
Operator/Manufacturer Scheduled Maintenance Development (MSG-3) by Air Transport Association (ATA). Washington, DC.
Procedures for Performing a Failure Mode, Effects and Criticality AnalysisDownload MIL-STD-1629RevA.pdf by Department of Defense. Washington, DC, 1984. Military Standard MIL-STD 1629A, Notice 2.
Reliability Centered Maintenance by Anthony M. Smith. New York: McGraw-Hill, 1993.
Reliability-Centered Maintenance by F. Stanley Nowlan and Howard Heap. San Francisco: Dolby Access Press, 1978.
Reliability-Centered Maintenance by F. Stanley Nowlan and Howard Heap. Washington, DC: Department of Defense, 1978. Report Number AD-A066579.
Reliability-Centered Maintenance by John Moubray. Oxford: Butterworth-Heinemann Ltd., 1991.
Reliability Centered Maintenance, A Practical Guide for Implementation by G. Zwingelstein. Paris: Hermes, 1996.
Reliability Centered Maintenance Guide for Facilities and Collateral Equipment. National Aeronautics and Space Administration, 1996.
Reliability-Centered Maintenance: Management and Engineering Methods by Ronald T. Anderson and Neri Lewis. London & New York: Elsevier Applied Science, 1990.
Reliabilityweb.com
Reliability, Maintainability, and Supportability Guidebook, Third Edition Society of Automotive Engineers, Inc., Warrendale, PA, 1995.
Risk-Based Management: A Reliability-Centered Approach by Richard B. Jones. Houston, TX: Gulf Publishing Company, 1995.

Other

American Productivity & Quality Center (APQC)—Benchmark Resource
Center for Risk and Reliability, University of Maryland
Reliability & Maintainability Center, University of Tennessee
PTC®, Inc.— Windchill FMEA (Failure Modes and Effects Analysis)

Focus

Operations & Maintenance (O&M)