| by James
R. Chiles |
One principle
that holds true across two centuries of human experience on the machine
frontier is that technological disasters usually don't come like
bolts from the blue. Instead, little malfunctions and errors link up beforehand,
over weeks and months. Usually, these are early warning signs, called
precursors, that offer time for those at the scene to stop the chain of
events.
Although it seems unbelievable to outsiders that anyone at the site would
have ignored the precursorsknowing as we do afterward that something
terrible really did happenthere had to be powerful reasons why
nothing was done. Perhaps nobody wanted to take the career risk of raising
the issue, or nobody even noticed the significance of the precursors among
the daily background noise of routine problems.
 |
| A routine of watching for early
hints of trouble has helped keep Southwest Airlines flying for more
than 30 years without a fatality. |
One employee who saw his duty and did it was Jerry Gonsalves. In the
spring of 1960, Gonsalves was working for the Cape Canaveral operations
of the Glenn L. Martin Co. as a quality assurance supervisor. At the time,
Martin was perfecting the Titan intercontinental ballistic missile for
the Air Force.
When he came to work on the afternoon of March 31, 1960, Gonsalves found
a pile of parts on his desk along with some missile blueprints. The parts
had been removed in the course of oxygen tank modification work just finished
on a Titan 1 missile that was ready for a test launch. His job was to
match the leftover parts against the plans, then pass them along for further
quality checks.
When Gonsalves counted the parts, they didn't quite match the plans.
After an hour of double-checking, he decided one bolt was missing from
the pile of leftovers. After finding out that none of the technicians
had accidentally left the job with the bolt still in his pocket, he became
convinced the stray bolt had rolled from sight down the curve of the oxygen
tank, falling into an outlet at the bottom. This outlet led to the oxygen
pump.
It concerned him because if the pump's metal impeller hit a bolt
in the presence of liquid oxygen, it would cause a fireball that would
certainly destroy the missile and probably the launch complex as well.
And accidents were happening: A Titan had blown up on December 12 the
previous year, destroying the pad just after launch.
Going to the Source
Gonsalves went to his supervisor, who suggested he fly to the Martin assembly
plant in Denver right away and see whether the description of leftover
parts might have been off by one bolt, which would explain the discrepancy.
There wasn't much time because the Titan was on Launch Complex
15 at Cape Canaveral.
The next day, Gonsalves examined brand-new Titan missiles on the Martin
assembly line in Colorado. Nothing explained the bolt's absence
so he sent a telegram from Denver saying his group needed to send someone
into the empty LOX tank and hunt for the bolt. The most that managers
were willing to do was have someone open an inspection hole at the top
and peer into the tank, using binoculars and a light. Gonsalves protested,
saying nobody would see the bolt that way if it had fallen into the outlet
of the tank.
Gonsalves, back at the Cape on Friday, refused to put his name on the
final papers needed for launch. An Air Force officer overheard the disagreement
and walked up to ask what was going on. After hearing the details, the
officer stopped the launch and ordered the tank drained and inspected.
A manager suggested that Gonsalves not come to work the next day, a Saturday,
when the Air Force would investigate the tank.
Gonsalves had no regrets: He had been hired to ensure quality and would
not back downeven though he had two young children and a wife
to support, and no immediate prospects of another job in the field of
quality assurance engineering.
Gonsalves took a friend fishing the next day. He even caught a sailfish
on the trip, the first time he'd landed one. Still better, he heard
upon returning home that the Air Force had found the missing bolt in a
pipe leading from the tank. The Air Force's senior officer on the
project sent Gonsalves a commendation letter the following October.
Jerry Gonsalves knew from a single, subtle precursor that his corner of
the Air Force test program was facing a fire and explosion, the kind of
thing now called an imminent catastrophic event.
For an example that helps explain the role of precursors in the huge variety
of disasters (and in the much greater number of close calls, mostly undocumented),
consider how a piece of metal breaks over time. Under stress, cracks begin
to grow out of tiny manufacturing flaws, corrosion, and damage during
use. Then, at a critical point, a metal fracture spreads like a gunshot
and the piece fails completely.
 |
| The Titan 1 missile on the launch
pad. |
As with metal, weak points exist in virtually all systems. But instead
of slag inclusions, nicks, and stress-corrosion cracks that we find in
metal, weak points in a system are made up of human errors and machine
malfunctions. No system is entirely free of weak points, so a good system
is one in which people catch these incipient "system fractures"
early, before a chain of them can link up to generate a catastrophe.
Southwest Airlines, which has never had a fatal crash in more than three
decades and thousands of short-hop, time-critical flights, is one such
"fracture-aware" companyand it is a profitable one.
Companies and agencies that deal in high-power, complex machines should
know that technological disasters have a long-lasting cost. They should
learn from the best organizations about how to catch system fractures
early.
Beside the obvious toll in deaths, damage, insurance cost increases, and
months of business interruption, in some cases so much public mistrust
follows that an entire branch of industry is cut off. Crashes of two Comet
airliners in 1954 halted British jet airliner manufacturing for so long
that the American airplane makers took over the market. The scale of devastation
can be enormous: A series of dam failures in China on the night of Aug.
7, 1975, killed more than 26,000 people. Incident costs ran well over
$4 billion at the Three Mile Island Unit 2 partial meltdown.
The Way to Catch Fractures
Catching system fractures happens in several stages. Alert employees notice
early problems. They decide that the potential problem is serious and
that preventing a disaster will need management support.
They put together a message to management arguing for attention to the
problem. Before this warning memo is fired off, however, they should consider
running it past some colleagues who come up with the kind of tough questions
sure to come from managers who see this as an imaginary problem. The military
calls this probing, questioning process a "murder board."
Ideally, management reaches some kind of decision, rather than deferring
and delaying. The decision might be to authorize a remedial action to
fix the system in question, but managers could also conclude that "it
ain't broke." A decision not to fix something can be as
legitimate as a decision to take action, if supported by facts and something
like a failure mode effects analysis. Not all signs of trouble need extraordinary
action.
Finally, any lessons should go out to others who need to know. This step
is often neglected by organizations that treat each mishap as an isolated
case, never to be repeated. Good "lesson harvesting" might
take the form of a bulletin to similar plants, or a change in manufacturing,
or improved training.
 |
| The Titan 1 missile in the air:
A missing bolt led a quality assurance engineer to correct a possibly
disastrous problem in the liquid oxygen tank. |
One of the few fields with good tracking of incipient system fractures
is aviation, with the Aviation Safety Reporting System. I drew on the
most instructive close-call accounts I could locate when writing my book
Inviting Disaster: Lessons From the Edge of Technology, but am
always looking for more.
Leaders can do much to put frontline employees in a "fracture aware"
frame of mind. Are workers encouraged to pass along problems they notice,
such as bolt holes not lining up on a steel-frame construction project?
Do workers see evidence of solid followup by management about their concerns?
Encouragement for thorough problem reporting should extend to the employees'
own errors. If workers are punished for fessing up to their own mistakes,
they are pretty likely to cover them up, possibly with grave consequences
later.
Technological crises are inevitable in the course of all major projects,
so knowing how to work through them is a valued skill. Good decision making
is what we're looking for, not perfect decisions.
While he was head of the vaunted Naval Reactors organization, Admiral
Hyman Rickover treated each crisis during manufacturing or testing as
an opportunity, a chance to ferret out problems before a nuclear submarine
went off to sea. He insisted on riding aboard each new submarine in its
sea trials, so if something went wrong during pressure tests, he would
be there to share the consequences of poor work or bad decisions.
While few of us enjoy troubleshooting incipient system fractures, such
work is certain to be at the core of what the most valued project leaders
do in our automated future. Computers are sure to take over many routine
operations that people perform now, but troubleshooting is a uniquely
human task and a good place for a career. "No-brainer" jobs
don't have much of a future given the pace of automation, so snag
one of the "brainer" jobs.
Organizations like Southwest Airlines and Naval Reactors, and some employees
like Jerry Gonsalves, have shown that safety, excellence, and production
can all fit together. For the rest of us who aren't quite there
yet, good fracture-awarenessamong frontline employees as well
as top decision makersis an important step in that direction.
James R. Chiles is a lecturer on technology and
safety. He is also the author of Inviting Disaster: Lessons From the Edge
of Technology (HarperBusiness, 2001), the subject of a recent series on
The History Channel. Contact him at chiles@ invitingdisaster.com.
home
| features | news
update | marketplace
| departments | about
ME | back issues |
ASME | site
search
© 2004 by The American Society
of Mechanical Engineers
|