September 2008

To read
our previous
ME Online Exclusives,
click here.

Maintenance and the Bottom Line

Going lean increases profits, as long as the manufacturer maintains a reliable plant.

by Neil B. Bloom

Every day, more manufacturers realize that they can no longer afford to produce mounds of inventory awaiting sales. They have learned that they must increase corporate profits by minimizing inventory costs.

You can call it lean manufacturing, just-in-time, or anything else: If a company can manufacture only what it needs when the customer wants it, no excess inventory is necessary. What becomes necessary instead is a production facility that a manufacturer can rely on to turn out products as orders come in. However, reliability requires a rational program of maintenance.

Too little maintenance or too much in the wrong places leaves a plant vulnerable to production line breakdowns, not to mention more serious emergencies, which can range from power outages to explosions, depending on the line of business the company is in. Even when a plant is safe, it can still be vulnerable. It’s clear that if the product cannot reach the customer when promised, the manufacturer will face major problems. It can see an order canceled or, if the failure happens too often, the manufacturer can lose revenue over the long term if potential customers decide to take their business elsewhere.

That’s where a program of reliability centered maintenance can make a significant contribution to the health of an enterprise.

RCM is a logical way of identifying the equipment in a plant or facility that must be maintained on a preventive basis rather than on a fail-and-fix basis. “Don’t fix it until it breaks” might be acceptable if no one’s life or livelihood will be lost in the breakdown. Fluorescent light tubes can run to failure. Key production machinery cannot.

I define reliability centered maintenance as: “A set of tasks generated on the basis of a systematic evaluation that are used to develop or optimize a maintenance program. RCM incorporates decision logic to ascertain the safety and operational consequences of failures and identifies the mechanisms responsible for those failures.”

In essence, RCM decision logic will determine if the failure of a component will result in an unwanted consequence to the plant operation, such as an unplanned shutdown, or an unplanned power reduction, or worse yet, an unexpected disaster.

RCM is not new. Maintenance Steering Group Logic, the predecessor to RCM, was used by the airlines in the early 1960s. Stanley Nowlan and Howard Heap of United Airlines introduced formal RCM to the commercial aviation industry in 1978. Airline preventive maintenance and reliability is primarily based on their work, and they are considered the grandfathers of reliability centered maintenance.

When I talk about a plant or facility, I am talking about any type of plant or facility, private or public. It could be a power generating facility, a shoe factory, a chip maker, an offshore oil platform, a daily newspaper plant, a paper mill, an automotive assembly line, a hospital, or a cruise ship—that is, any entity that manufactures a product or produces an output where it is unacceptable to incur unplanned interruptions or, worse yet, an event that can threaten life, limb, or prosperity.


Preventable Disasters


Some disruptions can’t be prevented. Others can be reduced in frequency. Some can be almost entirely eliminated.

There is little we can do about a hurricane or an earthquake except to build structures to withstand them.

Human error can cause disruptions, too. Fatigue, distraction, or inexperience in managing vehicles and equipment can cause serious, sometimes fatal accidents. Tools that are used to reduce instances of human error include better training, more specific procedural guidance, a safer work environment, and more rigid standards and codes. Human emotions and the limits of endurance make it impossible to eliminate human error entirely.

Many serious disruptions in factories and other facilities have their origin in the failure of equipment. An event of this sort can range from loss of business to loss of property, and in extreme cases to loss of life. Equipment failures probably offer the greatest chance of all for prevention. Nothing is ever 100 percent reliable, whether it is an aircraft, the Space Shuttle, or a nuclear power plant. However, disasters caused by equipment failure can to be harnessed to the degree that allows for almost 100 percent reliability.

We have a lot of control over the way we maintain our facilities and equipment to prevent failures. A program of reliability centered maintenance is probably the best path you can take toward reaching as close as possible to that 100 percent reliability goal.

There is a lot of verbiage associated with RCM. Sometimes discussion is limited and will cover boundaries, functions, interfaces, or functional failures. Detailed treatments may talk about establishing system boundaries, subsystem boundaries, in-system in-interfaces, in-system out-interfaces, out-system in-interfaces, out-system out-interfaces, system functions, subsystem functions, failures of sub-system functions, consequences of the functional failures, and so on.

The abundance of terms is one of the reasons that RCM has been so difficult to implement, and that is unfortunate, because most of this cumbersome process is not even needed. Although it is not an effortless challenge, RCM is not rocket science either.

The real cornerstones of reliability centered maintenance include an understanding of three key ideas:

• Understanding hidden failure modes, which are failures that are not evident until some other additional failure occurs.

• Understanding when a single-failure analysis is not acceptable, and a multiple failure analysis is required.

• Understanding when run-to-failure is acceptable and when it is not.

To illustrate some of these concepts, consider a possible configuration in a plant in which two pumps, each with a discharge check valve, feed into a common header. If one of the valves fails in the open position, the failure is not noticed as long as both pumps continue to run. If the pump with the failed valve stops running, the second failure makes the first evident because the flow will begin to back up into that pump. That would not happen if the valve had not failed.

Had the pump failed first, or the valve failed in the closed position, the failure would have been immediately evident. But the other possibility must be accounted for, so this configuration would require a multiple failure analysis.

An RCM program looks at every functioning part of a production line and considers the consequence and the likelihood of its failure. It identifies the criticality of each component, identifies the preventive maintenance techniques (which may include predictive maintenance) needed to prevent critical failures, and implements them.

An RCM analysis also considers that maintenance budgets have limits and gives some rational basis for deciding what to do and where to expend the most effort. In essence, the RCM decision logic will lead one to the correct determination of whether or not any given specific component in the plant needs to be maintained on a preventive maintenance basis to prevent its unplanned failure consequence, or if the component can be a candidate for a run-to-failure strategy.

Utilities and other industries have implemented various forms of RCM programs only to find that they continued to have fundamental reliability issues that were not addressed by their analyses. The primary reason is the lack of a grass-roots philosophical understanding of the principles governing the analysis.

RCM, meanwhile, is not defined by any of these ideas:

• A process that picks and chooses the systems and/or the components to analyze.

• A process that selectively analyzes a few given systems or certain components that everyone, including the janitor, knows is a problem and that has a major effect on the operation of the plant or facility when it fails.

• A PM review of what is already being done on system ABC or component XYZ.

• Converting time directed maintenance tasks into condition directed predictive maintenance tasks.

• Performing an analysis on a piece part, such as a bearing or a shaft, for example.

• Establishing PM templates for PM tasks.

In my presentations at different maintenance and engineering conferences around the country and in presenting workshops for major industrial organizations, I have found that there is a certain “fuzziness” or “mystique” associated with implementing an RCM program. Much of this fuzziness arises when attempting to analyze redundant equipment, to identify hidden failures, to invoke a run-to-failure strategy, to determine when a single-failure analysis is acceptable and when a multiple-failure analysis is required. Remember, it is not the obvious that creates the greatest potential for disaster. It is the unobvious.


Where Did It Come From?


In the early years of commercial jet aviation, aircraft manufacturers and airlines believed that if an aircraft was overhauled at a given time interval and completely torn apart, virtually system by system, component by component, that once released from the hangar, it would perform totally reliably until the next major overhaul, as long as intermediate maintenance activities were carried out. Most of the equipment was completely overhauled, whether it needed it or not. The manufacturers and the airlines found, though, that expected levels of reliability were still not met. So they tried to perform the overhaul more often to reach the level or reliability they were seeking.

Once again, the entire aircraft and virtually all of its components were completely torn apart as before, only more often, and again planes remained in the hangar for weeks earning no revenue. Once released from the hangar, expected levels of reliability were still not achieved and, in fact, were even lower than expected.

This disappointment led to the work of Nowlan and Heap. They began to understand that preserving critical equipment functions, rather than randomly and arbitrarily tearing an entire aircraft apart, was the key to reliability. They also found that indiscriminately overhauling equipment actually reduced reliability because the probability of failure of the newly replaced equipment increased due to premature failures and infant mortality.

The reasons for this phenomenon are many. Taking apart and reassembling a complex system introduces opportunities for human error. The major contributors are incorrect assembly after overhaul, incorrect parts being used, or incorrect installation. Additionally, there is usually a wear-in pattern that must be developed after replacement, and during that time frame, the probability of failure increases.

Another interesting phenomenon they discovered was that similar components did not wear out over time in any sort of identical manner. In fact, Nowlan and Heap showed that only approximately 11 percent of all types of components exhibited a wear rate that lent itself to replacement on a given schedule. That meant that almost 90 percent of all other components failed randomly. The solution is predictive maintenance, monitoring the health of these parts on a frequent basis to ascertain more precisely when an overhaul is absolutely necessary. Scheduled overhauls proved to be counterproductive for this population. You don’t want to waste resources indiscriminately overhauling equipment.

An effective RCM program will allow a preventive maintenance program to evolve from a level based primarily on vendor recommendations, random selection, or arbitrary assignment, to one based on more prudent fundamentals, such as a component functional analysis and the identification of any subsequent safety or operational consequences to your facility as a result of the component functional failure. This will provide a greater confidence level that your preventive maintenance program consists only of those tasks that are specifically required for the safe, reliable, and efficient operation of the plant and that any unnecessary work has been eliminated.

Scheduling unnecessary preventive maintenance activities may reduce overall plant reliability because of the burdens it places on operations and maintenance personnel. The additional burdens contribute to the unnecessary depletion of available resources.

More than ever, the bottom line of a corporation is dependent on the reliability of its output. Companies must minimize delays, maintain capacity, ensure safety, and avoid the cost and possible scandal of legal or regulatory infractions. RCM is a means of identifying equipment functions that must be preserved to protect corporate assets and ensure the corporate revenue stream.

I would also like to note that the RCM process must remain a living one. New failure modes may become evident and additional information relative to equipment performance may present itself at any time. Often, schedules of certain preventive maintenance tasks need to be adjusted. Intervals may need to be increased or decreased. Newly identified tasks may be added and others deleted based on new or different operating conditions or plant modifications.


It’s the Unexpected


The vast majority of major disasters due to equipment failure are caused by equipment whose consequences of failure were unexpected and never analyzed, or whose components were mistakenly approved for run-to-failure. These disasters are “surprises.”

A company is asking for surprises of this sort if it budgets maintenance expenses only for the well-known problem components. Management is taking on a substantial risk of catastrophic failure if it eliminates as much as 80 percent of a plant from analysis for functional failures. Bad surprises can also rise from the misconception that redundant components automatically qualify for a run-to-failure status.

Most plants have maintenance budgets that encompass the known maintenance work that needs to be done. However, one unexpected failure can exceed that budget 2, 4, or 10 times over. Worse yet, one unexpected disaster can ground your entire fleet or shut down your facility for good. The real-life examples of major disaster occurrences as the result of equipment failures are bountiful. A BP oil refinery explosion in Texas is just one of them. Several fatalities resulted from that recent disaster. An investigation committee was commissioned to determine the cause of this incident and it was headed by James Baker, the former Secretary of State. One of the final commission findings of the incident determined that inattention to proper preventive maintenance at the facility led to the occurrence.

Given the nature of running a company today, the maintenance organization is no longer relegated to the farm team, behind sales, marketing, and finance. The cultural shift taking place is elevating the importance of the maintenance organization, as an integral team member to ensure safety and reliability and to maximize the bottom line.


Neil B. Bloom is the author of Reliability Centered Maintenance—Implementation Made Simple, published by McGraw-Hill. He is an instructor of RCM in the Continuing Education Division at the University of California, Irvine, and conducts in-house RCM training seminars for major corporations. He can be reached at (949) 218-1286. His Web site is www.RCMtrainingseminars.com.

 

home | features | breaking news | marketplace | departments | about ME
back issues | ASME | site search

© 2008 by The American Society of Mechanical Engineers