Disaster book club: What you need to read to understand the crash of Air France 447

Right now, I'm reading a book about why catastrophic technological failures happen and what, if anything, we can actually do about them. It's called Normal Accidents by Charles Perrow, a Yale sociologist.

I've not finished this book yet, but I've gotten far enough into it that I think I get Perrow's basic thesis. (People with more Perrow-reading experience, feel free to correct me, here.) Essentially, it's this: When there is inherent risk in using a technology, we try to build systems that take into account obvious, single-point failures and prevent them. The more single-point failures we try to prevent through system design, however, the more complex the systems become. Eventually, you have a system where the interactions between different fail-safes can, ironically, cause bigger failures that are harder to predict, and harder to spot as they're happening. Because of this, we have to make our decisions about technology from the position that we can never, truly, make technology risk-free.

I couldn't help think of Charles Perrow this morning, while reading Popular Mechanics' gripping account of what really happened on Air France 447, the jetliner that plunged into the Atlantic Ocean in the summer of 2009.

As writer Jeff Wise works his way through the transcript of the doomed plane's cockpit voice recorder, what we see, on the surface, looks like human error. Dumb pilots. But there's more going on than that. That's one of the other things I'm picking up from Perrow. What we call human error is often a mixture of simple mistakes, and the confusion inherent in working with complex systems.

Let me excerpt a couple of key parts of the Popular Mechanics piece. You really need to read the full thing, though. Be prepared to feel tense. This story will get your heart rate up, even though (and possibly because) you know the conclusion.

We now understand that, indeed, AF447 passed into clouds associated with a large system of thunderstorms, its speed sensors became iced over, and the autopilot disengaged. In the ensuing confusion, the pilots lost control of the airplane because they reacted incorrectly to the loss of instrumentation and then seemed unable to comprehend the nature of the problems they had caused. Neither weather nor malfunction doomed AF447, nor a complex chain of error, but a simple but persistent mistake on the part of one of the pilots.

Human judgments, of course, are never made in a vacuum. Pilots are part of a complex system that can either increase or reduce the probability that they will make a mistake. After this accident, the million-dollar question is whether training, instrumentation, and cockpit procedures can be modified all around the world so that no one will ever make this mistake again—or whether the inclusion of the human element will always entail the possibility of a catastrophic outcome. After all, the men who crashed AF447 were three highly trained pilots flying for one of the most prestigious fleets in the world. If they could fly a perfectly good plane into the ocean, then what airline could plausibly say, "Our pilots would never do that"?

One of the pilots seems to have kept the nose of the plane up throughout the growing disaster, making this choice over and over, even though it was the worst possible thing he could have done. At the same time, everyone in the cockpit seems to have completely ignored an alarm system that was, explicitly, telling them that the plane was stalling.

Why would they do that? As Wise points out, this is the kind of mistake highly trained pilots shouldn't make. But they did it. And they seem to have done it because of what they knew, and thought they knew, about the plane's complex safety systems. Take that stall alarm, for instance. Turns out, there's a surprisingly logical reason why someone might ignore that alarm.

Still, the pilots continue to ignore it, and the reason may be that they believe it is impossible for them to stall the airplane. It's not an entirely unreasonable idea: The Airbus is a fly-by-wire plane; the control inputs are not fed directly to the control surfaces, but to a computer, which then in turn commands actuators that move the ailerons, rudder, elevator, and flaps. The vast majority of the time, the computer operates within what's known as normal law, which means that the computer will not enact any control movements that would cause the plane to leave its flight envelope. "You can't stall the airplane in normal law," says Godfrey Camilleri, a flight instructor who teaches Airbus 330 systems to US Airways pilots.

But once the computer lost its airspeed data, it disconnected the autopilot and switched from normal law to "alternate law," a regime with far fewer restrictions on what a pilot can do. "Once you're in alternate law, you can stall the airplane," Camilleri says. It's quite possible that Bonin had never flown an airplane in alternate law, or understood its lack of restrictions. According to Camilleri, not one of US Airway's 17 Airbus 330s has ever been in alternate law. Therefore, Bonin may have assumed that the stall warning was spurious because he didn't realize that the plane could remove its own restrictions against stalling and, indeed, had done so.

That, I think, is where Charles Perrow and Air France 447 cross paths. It follows closely with a concept that Perrow calls "incomprehensibility." Basically, the people involved in an accident like this often can't figure out fast enough what is happening. That's because, in high-stress situations, the brain reverts to well-trod models that help you understand your world. You think about the stuff you've practiced 1000 times. You think about what you've been told will happen, if x happens.

But what happens if what's actually going on doesn't mesh with your training? Then the brain finds ways to make it mesh. Those rational explanations might make a whole lot of sense to you, in the moment. But they will lead you to make mistakes that exacerbate an already growing problem.

This is not comforting stuff.

Perrow doesn't tell us that we can figure out how to design a system that never becomes incomprehensible. There is no happy ending. We can design better systems, systems that take the way the brain works into account. We can make systems safer, to a point. But we cannot make a safe system. There is no such thing as a plane that will never crash. There is no such thing as a pilot who will always know the right thing to do.

Instead, Perrow's book is more about how we make decisions regarding risky technologies. Which high-risk technologies are we comfortable using and in what contexts? How do we decide whether the benefit outweighs the risk?

We must have these conversations. We cannot have these conversations if we're clinging to the position that anything less than 100% safety is unacceptable. We cannot have these conversations if we're clinging to the position that good governance and good engineering can create a risk-free world, where accidents only happen to idiots.

I used to believe both those myths. I want to believe them still. Increasingly, I can't. Looking at technological safety in terms of absolutes is child's view of the world. What Perrow is really saying is that it's time for us to grow up.

• Landing gear of Air France 447, Investigation and Analysis Bureau.
• Memorial to victims of Air France 447 in Rio de Janeiro, Brazil, REUTERS/Ana Carolina Fernandes.


  1. “Basically, the people involved in an accident like this often can’t figure out fast enough what is happening”

    The irony is they actually had plenty of time to figure out what was happening, but stuck with the knee jerk (pull back the stick) reaction.  

    It’s worth pointing out that the knee jerk/instinctive reaction can also be what prevents accidents.  When something happens and there’s no opportunity to analyze and think, only do.

  2. The more single-point failures we try to prevent through system design, however, the more complex the systems become. Eventually, you have a system where the interactions between different fail-safes can, ironically, cause bigger failures that are harder to predict,

    Just as an example, you can design a nuclear reactor to withstand the worst credible shaking from an earthquake, and, just to be safe, to automatically shut down after even mild shaking so as to give you a chance to make a thorough inspection for damage.

    Then when a very large (but fairly distant) quake produces only moderate local shaking that shouldn’t significantly damage any well-built structure (never mind a structure as carefully and thoroughly reinforced as your reactor), you enter auto-shutdown mode – “just to be safe” – and then the diesel generators that are critical for a safe shutdown get swamped by an unexpectedly-large tsunami generated by the quake.

    That’s what happened at Fukushima.  Ironically, if they’d left the reactor running, they most likely wouldn’t have had a problem, as the generators that got swamped aren’t needed during normal operation.

    1. A lot of Perrow’s book, which was written in the early 1980s, is centered around the chain of events that led to Three Mile Island. It is a veritable cluster fuck of safety features interacting with each other to cause problems and incomprehensible situations that the operators could not have predicted or trained for. 

    2. Also came here to point out the intersection of Perrow’s book with nuclear plants and applicability to Fukushima.  
      Would be interesting to contemplate North Anna shutdown in the wake of the VA quake to have a sense of what went right.  Absent an “accident” however, it may be a while before we know much about what happened with the shut down at North Anna. 

  3. That’s fascinating, that the airplane was operating in a mode which the pilot was completely unfamiliar with. You’d think that they would train the pilots about this mode, no? At least in a simulator where it’s not too hard to pretend that the virtual pitot tubes are iced up.

  4. The verbal warning and “cricket” chime of a crashing airplane is probably the single most chilling thing I’ve ever heard in my life.  There’s a YouTube video of a crashing fighter jet… the pilot bails out safely, as the video from the cockpit continues.  Just before impact you hear: *chirp* PULL. UP. *chirp* PULL. UP.  Wish I could find the link.  Creepiest audio icon ever.

    1. The cold open at the start of the James Abrams reboot of “Star Trek” is a great example of this. Incidentally, that scene will make me freak out and cry my eyes out even though I’ve seen it several times and of course the entire movie is quite facetious.   (“Pitch Black”‘s opening crash is #2.)

    2. There’s a YouTube clip of the cockpit recorder from Japan Airlines flight 123 that is similar. Toward the end of it you hear an automated “Pull up…Pull up” amid the frantic pilot yells and alarms. Very disturbing. Over 500 people died in that crash.

      1. I thought of that same clip when I read Zachary’s comment. While being fairly innocuous and nongraphic, that clip has always stuck with me.

  5. This idea of trying to incorrectly map known situations onto unknown was also one of the points stressed in Laurence Gonzales’ book “Deep Survival”.  The system or situation doesn’t even have to directly be man-made for this to happen.

  6. Perrows main thesis is that when systems have what he calls interactive complexity and tight coupling, unforeseen interactions will occur, and operators will not be able to intervene fast enough, thus making a “normal accident”. One system being described in Perrows book as complex and tight coupled is the airline system.

    However, its just a theory. Although anchored with some good empiric studies, it aint proven. There are some competing theories, such as  HRO (high reliability organizations) and Charles Reasons “organizational model of accidents”, both of which put a lot of focus into organizational safety culture as a counterweight to unsafe technology.

    Anyone interested in learning more about normal accidents and high reliability organisations should read Scott Sagans very well written book “the Limits of Safety: Organizations, Accidents, and Nuclear Weapons”, where both theories are used.

    1. Well said.  I was going to say much of this.  Personally, I do like Perrow’s argument, but that doesn’t mean he’s telling the *whole* story.

      Do note, though, that “just a theory” isn’t really accurate, nor is “it ain’t proven.”  This is science, so theory is a goal in itself, and no theory is ever proved.  That’s not what we do.

  7. I quickly came to the “Dumb Pilots” conclusion myself after reading the article on PopMech. But what you have written does make sense. Rational people behaving irrationally in times of great stress. Sometimes its not lack of knowledge which can cause problems but too much of it. This crash should be taken as an example and studied, so that systems where the 100% safety assumption is not made. Thanks for the post, was a good read.

  8. I would also note that, if, what this account says…

    It’s quite possible that Bonin had never flown an airplane in alternate law, or understood its lack of restrictions.

    …is true, then that kinda suggests that they weren’t really “highly trained.”

    (Unless you’re willing to credit that they were “highly-but-inadequately trained.”)

    1. My pilot friends always say “It’s never just one thing.” In this case, you had a relatively minor technical problem, plus a pilot doing the wrong thing, plus two other pilots who didn’t know that he was doing it, plus …

      Also, bad assumptions. It turns out that when you assume, you don’t just make an ass out of ‘u’ and ‘me’, you also make a colossal splash and a four-mile oil-slick.

  9. It’s not only airplanes. Or do you know what goes on in your car? You have all kinds of fancy electronics keeping it on the road, you think it will be save at 250km/h. And then something happens which the electronics cannot foresee or your electronics break down… Accidents in general are going down, but drastic accidents still happen and might increase.

    Having the appearance of safeness is the worst thing you can do. It ends up in being safe until it goes catastrophic. If you get the feeling of being not safe at some point before it goes critical, you stop before catastrophe happens.

      1. Lucky you. :-)

        I had a defective front tie rod snap once when I was doing about 50 mph.  Not fun.  =:o

        Of course, I suppose “runs off the road, down an embankment, and rolls twice before thudding into a tree” might count as “rolls to a stop.” :-) I suspect, though, that that’s not what you meant by “rolls”. :-)

        1. But what does your tie rod have to do with the computer systems of the car? What conflict of safety measures occured to make this pertinent to the above comments? That’s a manufacturing and quality control error on a single point of failure, not a systemic confliction.

          Hell, I’d be willing to wager that your vehicle’s safety measures actually all employed appropriately without conflict. Airbags, seatbelts, engine-kill, impact deformation, even anti-lock brakes would still work (although their assistance is limited when one front wheel
          gains uncontrollable freedom of movement).

          To be honest, the only way I think a car’s computer systems could produce the pertinant kind of an error is when a careless driver blindly follows inaccurate GPS instructions by assuming they can’t be wrong.

          1. But he said “when it [my car] has a catastrophic malfunction”, not “when my car’s computer systems have a catastrophic malfunction.”

            Okay,  maybe I was drifting a bit off-topic, but still…

            As for the ‘other safety measures’ you name: this was many years ago, and the only one of those things the car came equipped with was a lap belt.

            (Fortunately for me, I was an experienced stunt driver, and had just two months earlier installed a 5-point racing harness.) 

            (Coulda used a helmet, though.)

      2. While I have no proof of this or stats, it has been told to me by a reliable source that when all wheel independent anti-lock brakes became a standard feature on domestic cars, there was a sudden jump in new vehicle traffic accidents. Particularly ones involving slippery conditions. This was determined to be due the the cars new behavior of pulsing the brake pedal and causing the steering wheel to shake. This caused driver panic, and people took their foot off the brake. Not a mechanical problem, but one of unfamiliar conditions. If you want direct evidence of the new info problem, look at auto yards for wreaked vehicles with less than 25 miles on the odometer. Happens pretty often.
        Consumer vehicles are full of single point failure systems. its just that they are considered reliable enough on their own, or at worst have a predictable and mostly safe fail condition. However anything is possible, and every imaginable failure, and a few that are not, are bound to happen. Add driver reaction into the mix, and it gets much more complex.

    1. I’m in the market for a new car, and the sheer amount of safety systems and computer control does make me wonder a lot. The car I’m looking at comes in two flavors: front wheel drive, and all wheel drive. The AWD model has a computer-controlled torque-vectoring AWD system, whereas the FWD model just has regular front-wheel drive. Both of them have four-wheel ABS, Electronic Brake Force Distribution, vehicle stability control, a computer-controlled CVT, an electronic throttle, and push-button start.  (Nissan’s website even goes out of its way to tell you about the Brake Override function and the push-button-start failsafe that lets you kill the engine.) 

      So, what happens if the whole thing goes nuts?  

      1. Well, let’s dissect this one by one.

        CVT or Continuously Variable Transmission is chiefly concerned with maintaining an ideal RPM for the drive shaft. I am NOT an automotive expert, but I believe that in the event of catastrophic failure on the part of the control mechanisms, one of a few things would happen.

        1) The system would simply stop giving and receiving orders, the
        transmission would stop updating the effective gear ratio, and the gear would become “stuck” where it was last set.

        2) The system would “reset” and go to some default gear ratio which might be inappropriate for the situation.

        3) The system would “disengage” and drop out of gear.

        4) The system would “go haywire” and enter erroneous gear ratios inappropraite for the situation.

        In all four scenarios, the most likely result would be sudden unexpected loss of power, or unusual or nonexistant response to the accelerator. Given various conditions, there may be permanent damage to the physical components of the transmission system itself. Essentially it’s like throwing a manual into the wrong gear – it will be physically obvious something is wrong and the driver should immediately stop accelerating.

        Still, this COULD result in undesirable torque spikes, causing wheel slip and potential loss of control.


        AWD or All Wheel Drive refers to various forms of always active Four Wheel Drive systems, in which a center differential allows the front and rear axles to rotate at different speeds. One of the problems with AWD is that because of the differentials, without an active traction control system a loss of traction in ANY of the four tires would result in ALL of them being immobilized. If the computer systems on your AWD fail, you can easily lock all of your tires by driving beyond the system’s passive capabilities. Doing so will likely lead to loss of control.


        ABS or Antilock Brake Systems are a pretty simple concept, they prevent you from accidentally locking your brakes. If you brake too hard, you can stop the wheel from spinning, but the inertia of the vehicle will still carry you forward, causing you to skid. Antilock Brakes simply limit the force your brakes apply, keeping it high enough to slow at an ideal rate but low enough not to stop the tires completely and cause a skid.

        Ultimately, if the ABS fails, a driver could brake too heavily and lock their wheels, possibly losing control.


        Electronic Brake Force Distribution is almost just an extension of the ABS. It allows for applying variable brake forces to each wheel
        depending on road conditions to maximize stopping power without losing control.

        Like the ABS, failure could cause wheel lock if the driver brakes too heavily, possibly losing control.


        Electronic Stability Control is a system which operates to prevent skids. Essentially, when a driver gets close to skidding, it selectively applies the brakes to needed individual wheels to control the vehicle.

        It basically corrects for driver error in a situation which might lead to loss of control, a task it naturally will not perform if it fails.


        Electronic Throttle Control is a system in which the gas pedal has no physical interaction with the throttle, meaning all vehicle throttle is determined by computer calculation.

        In the event of a failure, the throttle will simply cut out. I’m not sure on this, but it MAY be possible for the ETC to “go haywire” and supply inappropriate amounts of throttle, but I do NOT actually know.

        Push Button start is pretty much entirely a cosmetic feature, allowing for indirect ignition of the engine without a physical key mechanism.

        If this fails, you simply cannot start the car.

        Now then…

        If EVERY SINGLE ONE of these systems fails at once, there are basically two potential outcomes.

        1) They fail at a non-critical time, the driver is immediately alerted to at least some of the failures by their self-evident natures, and the driver manages to safely come to a stop.

        2) They fail at a critical time such as during a tight high speed turn (in which case the driver is already approaching danger by dint of their driving) and they lose control.

        Note – these systems are all designed to compensate for inherent physics-based driving dangers and risks. They allow a driver to maintain control in extreme situations they otherwise would lose control in.

        For the average driver, this is ostensibly a good thing – it makes the vehicle more idiot proof. The systems cannot stop an idiot from driving recklessly, but they can raise the threshold of critical failure a bit – potentially saving lives by making it so it takes them driving more recklessly for longer to cause an accident.

        For the professional driver, this is also a good thing – it allows for advanced vehicle control at a higher threshold of performance.

        And the risk of failure is STAGGERING low. In the rare event that there is a failure, it would likely be amazingly obvious and an alert driver can compensate for the failure in the same way they would compensate for their own driving error in a non-protected vehicle – slow down and maintain control.

      2. I wrote a big long post explaining the function of each of those systems and how they could possibly fail, and I thought it posted fine, but it either got chewed up by the BB comment system (not unlikely, sadly) or it got nuked by an admin (possibly for being too long? well boo on me for being overly thorough, I suppose).

        Long story short, all of those features (except push-button start) are designed to allow a vehicle to stay in control longer than it would without them.

        If they failed, the greatest danger would be if you were driving at a high level of performance, such as in a tight turn or at high speed. Fortunately, if they fail at any other time it would be fairly easy to come to a safe stop, and they’re all pretty staggeringly obvious failures.

        The important thing to remember is that without them, you’d lose control beyond a certain point anyway. For the everyday driver, they help make it harder to lose control, thus meaning it takes greater (and less likely) degrees of recklessness / lapse of concentration to cause an accident. For the skilled or professional driver, it allows for a higher level of performance from the vehicle. Pretty much win-win all around.

        Oh, and if you want the more detail explanations, Wikipedia can tell you everything you could ever need to know!

        1. I agree with your assessment, however, it took a couple of vehicles with incorrectly designed ABS systems to create a system that fails “safe”. Now if any of the ABS system receives erroneous or unexpected data, it disables itself, and allows the brakes to function in a basic mode (they are still a hydraulic system powered by your foot) its why we have electronic throttles, but not electronic brakes*. There were a couple cars in the early ABS days that would not apply the brakes if a wheel speed sensor read no speed. This is a bad fail state.
          *yes the system can apply the brakes on its own on some cars. this is not the same as you telling the computer, to tell the brakes to activate. Vehicle brakes are a very complex system designed to fail always to a less functional, but still functional state, even in the event of total failure of multiple system components. And why we still have E-brakes.

  10. My police friend says that when he investigates traffic mishaps he always repeats the mantra of ” There are no accidents” . I finally figured out one day what he meant..

    1. That seems to be a popular mantra among people who think that ‘accident’ implies that the event was unavoidable and no one can be blamed.

      But, really, ‘accidental’ is the opposite of ‘intentional’, not the opposite of ‘avoidable’.  Most accidents could have been avoided, and generally  have plenty of blame to go around.

      1. intention vs. accidental, that’s a great definition!

        as to the idea of blame: we lack, perhaps, good words to separate situations of clear blame, and simple statistics.  we could all avoid car accidents by not driving, but we (mostly) accept that risk as reasonable.  there’s some point, though, at which we say: this thing a person did, that was wrong — but often its really just a statistic that hasn’t happened in our own lives yet.

      1. I’m guessing he means that what we call an “accident” is really an incident where one or more participants exercised poor judgement and decision making. Hence mishap (I had a cop friend who used to refer to them to misadventures).

  11. One thing that’s always worried me about automation in airplanes (and, potentially, other vehicles, if we start coming up with semi-autonomous cars) is how their failure mode is to simply dump everything into a human’s lap and say “best of luck, chump.” Insulating a human from the actual complexity of what they are doing, such as guiding an aircraft, only to dump everything into their lap when it hits the fan is a really bad idea.

    I think something similar happened in the commuter flight that crashed in New York a few years ago. The wings iced up, which the autopilot compensated for, until it couldn’t, then it bailed and the pilots had to figure out what was going on. They made poor decisions (in part because of fatigue and poor training) and did exactly the same thing as the Air France: try pulling the nose up when the plane was stalling. They were just a few thousand feet up, so that stall was unavoidably fatal.

    I suppose what might help is telling the human, as loudly as possible, why the automation is kicking out is key. I’m not sure if in the Air France case, there was a clear statement that “I’m bailing out because I have no idea what the airspeed is.” That information might have been vital, and may have triggered the pilots to question whether the indicated airspeed was correct or not. 

    1. Very good point! When the machine handles most situations, the people running it have no idea how far outside of ‘normal’ things have gotten, until it’s too late.

      1. no, precisely not this. that reports causes. the user needs a condition report: “Insufficient data – now in alternate law mode. The plane can stall”. Don’t ask them to remember what a lack of data implies – tell them what the newly triggered condition means for them and their actions.

        1. This would be fine if “alternate law mode” means the same thing every time but I seriously doubt it does.  The pilots need to know they’re in control now but they also need to know why so they know what to do next. 

          They might not need a complete rundown of everything that’s going wrong but they need at least a little bit of context.  Otherwise pilots will train themselves not to pull up even when pulling up is the correct thing to do.

    2. I’m suffering mild confusion, “bailing out” is jumping out of an airplane, and comes originally from “bale out” as in a parachute bale, distorted over time by “bail out” as in using a bucket to get water out of a boat.

      Anywho… how should I be reading your comments about the autopilot bailing out? I’m afraid I might just be suffering from a moment of being dense. Just poetic metaphor?

      1. “bailing out” means “to quit doing something” in this context. like when you are the pilot, and you bail out of a plane, you are quitting at the task of flying it.

  12. About 15 years ago, I knew a guy who was a plane mechanic for United.  He would always say that the pilots were just “ballast”, that the planes did all the flying, and, even scarier, that the vast majority of accidents were due to pilot error but that it was typically publicly blamed on “mechanical problems” because that would freak people out less. To some extent I chalked it up to the typical rivalry that occurs between classes with different status, yet it had the ring of truth… 

    Modern systems make us less reliant on pilots to protect the plane and passengers, but that also means that in the increasingly rare cases when pilots are needed, they have less experience and are less prepared.  Autopilot can take off, land, and everything in between, so how many actual hours of actual hands-on flying of the plan do pilots get anymore?

    1. I would contend that your acquaintance doesn’t really know what he’s talking about. Mechanics generally have very little understanding of what pilots do (and some rivalry or envy is common) and I’ve seen a tendency among mechanics to talk shit about pilots. Not saying that my experience of pilots makes the pilots any better (and they generally have a very limited understanding of the details of what the mechanic does)…

      The planes do all the flying? Depends on how you look at it. Autopilot does NOT take-off (although it would technically be possible) and the crew spend half an hour (at least) before take-off setting everything up in the airplane (after having made key decisions about the flight in the ops room). It’s true that not much flying on passenger jets gets done by hand these days, that is, the pilots generally don’t steer much by holding the horns and manually flying. Mostly the first few minutes after take-off and possibly parts of the approach to landing (and the landing). Instead they steer by changing settings. You want to turn left to heading 235°? Turn the heading bug to 235 while the autopilot is in hdg mode. Required change in flight level? Set a new target altitude, and set the autopilot to follow that new altitude. The pilots have become mostly systems operators, and emergency controllers (as well as executive decision makers) during high risk phases of flight such as take-off, initial climb, approach, and landing.
      The funny thing is that even though machines seem really advanced and it looks like accidents are due to stupid humans, the thing is that the machines make even the most boneheaded decisions made by pilots look like strokes of genius. The machines (including state of the art decision making systems) are really that stupid. They have a long ways to go just to reach cockroach level. They work fine as long as there are no unknowns, and no “damned if you do, damned if you don’t” decisions to be made. It’s a pity that even normal operation tends to contain unknowns and more, let alone during emergencies.

      Pilots of big jets mostly do hands on flying in the simulators and in small hobby planes which many fly on their days off. The simulator sessions are grueling training sessions, with an amazing array of strange emergencies being thrown at the pilots as well as the most common emergencies being trained to perfection (which is why you rarely hear of jets going off runway ends after aborted take-offs these days). They really are very, very good (in general) and the training for most airlines is somewhere between really good and amazing. You can see that by events such as the amazing landing on the Hudson river that was performed after one aircraft almost hit a worst case scenario (worst case would have had worse visibility due to low clouds or darkness, in which case there might not have been any survivors, no amount of skill could have compensated for that). A mixture of a lucky break and skill are important in any major emergency. Problems (involving delays) are often attributed to mechanical problems, but blame for accidents generally gets put on the pilots, even where it is totally unrealistic to expect them to do anything differently to how they did. The Airbus crash at Mulhouse-Habsheim is an interesting example. The crash was due to the aircraft being in approach mode, and the button and the light for showing/selecting modes were not in the same place and not very visible. When a problem became apparent the pilots didn’t see the mode issue and just did like they would have done on any other aircraft: they advanced the power levers and tried to pull the aircraft up. This spooled up the engines but did not get the aircraft to climb, as approach mode meant that it wouldn’t climb without going into “go-around” mode. The plane went at full speed into a forest, killing all the crew that was on board (no passengers on the  flight). The cause? Pilot error. Airbus refused that there was anything wrong with the design and that the design might have been a factor, yet they redesigned this part of the system. Pilot error… We used to joke that if a crevasse were to open in the middle of the runway while we were landing and we would go into that it would be categorised as pilot error…

    2. The question is, if we compare modern pilots to pre-computerization pilots, what will we see?

      Were early pilots more experienced with handling major crises? Was fatigue more of a problem for early pilots because they had to be constantly alert? Were early airframes less reliable or less able to withstand adverse conditions? Are there a greater number or greater percentage of accidents now, or less? Are regulations better now? Is maintenance better now? Do modern pilots fly more often? For longer hours and over greater distances? How’s navigation compare?

      There are just so many factors to consider. It used to be that you only had your eyes, your instruments, and a radio that “maybe” worked. There were times when flying across the US you would enter radio dead zones, out of range of communications because of terrain or distance. Even when in range there was no guarantee you’d not face interference, physical radio failure, radio operator failure (on either end), or just plain poor reception. There were certainly fewer tools to combat bad weather years ago. Modern planes have a damn hard time flying off course or running low on fuel, and can even make landings COMPLETELY BLIND when faced with fog or other vision obstructions.

      Flight is tricky.

      1. I think what it takes to fly a plane has changed quite a bit over time.   Therefore the  Joint Cognitive System (JCS) has changed.  As a JCS evolves, there is a tendency to get a “Stretched System” Woods and Hollnagel, 2006).  Under pressure for higher and more efficient levels of performance the organization will exploit new capabilities to demand more complex forms of work.  This further stresses the JCS particularly the human decision-maker.

  13. Many years ago, I read Normal Accidents, James R. Chiles’s Inviting Disaster, and Diane Vaughan’s The Challenger Launch Decision all at about the same time. They’re all good in their own way, but of the three I found Inviting Disaster the most readable and accessible, and now it’s the only one I still own.

    Along the lines of figuring out what is happening and how to react to it, I’d suggest Malcolm Gladwell’s article on “The Art of Failure”, wherein he describes the difference between choking and panic.

  14. This is the first real explanation I’ve seen in the media of why the pilots were so (supposedly) foolish as to continue to pull on the yoke instead of pushing on the yoke, to gain airspeed. Makes perfect sense…and I venture that most “highly trained” pilots would have reacted teh same way, even if this scenario had been mentioned to them in a class likely years prior. A failure of human-factors engineering.

    Good job Popular Mechanics, as well as with the 9/11 WTC analysis, which rendered the conspiracy theories laughable…assuming an objective and critical reading of both sides of the issue. From now on the 9/11 theories should all be handled by Psychology Today.

    1. I agree with you regarding Popular Mechanics and their analyses. They appear to have a very sharp staff working there.

  15. The Air France accident is yet another example of a Joint Cognitive System (JCS, human-technology team) failure.  With the unconstrained growth in the complexity of technology, it is the human decision-maker that is left to try stresses due to situational factors because it is the only part of the system that can adapt.  This is an area where Cognitive Systems Engineering (CSE), a field that grew out of another JCS failure – Three Mile Island, can help mitigate these stresses.  CSE attempts to understand how humans cope with complexity and to design system that leverage these adaptive mechanisms.  The 447 crash has several CSE design patterns that have been seen in other domains – mode awareness, teaming with automation, lack of observability. 

    When designing a new system interface, like an aircraft cockpit, CSE designer must consider; how the system will cope with complexity, how to engineer a JCS so that human-technology team are treated as a cohesive cognitive system, and how the the interface to the technology artifacts affects specific work functions.  The tragedy is that there are approaches to mitigate these issues, but because most development efforts lead with technology and only consider the human operator at the end, we are destined to repeat incidents like the 447 crash, Bhopal, and Three Mile Island over and over again.

    1. What he said!
      The lack of visibility (visibility of the state of the system) coupled with mode awareness is a major issue that needs work. Fortunately there are people working on this, and there has been a real awakening around issues like these. I seriously think that there have been pretty  massive changes in how design is approached, with much more focus being put on the man-machine interface and the cognitive capabilities (with stress and time-pressure being factored in) of the operator being taken into consideration these days. 
      Of course, most expensive systems being used are older models (if something costs millions or billions then it tends to stay in use for a while), so it takes time updating, and the aircraft cockpit/system design can do even better than it has done so far. 

  16. Perhaps a good example of this is when you drive with cruise control on.  If you come up on an unexpected curve or something else requiring you to slow down a bit, there’s a certain instinct just to take the foot off the gas.  But it’s already off the gas, which leads to a bit of “I’m doing what I should but it’s not working!” thought.

  17. I can’t tell you the number of times, working in software, that I’ve kept doing the wrong thing due to faulty assumptions about “oh, it can’t be failing because of *that*, we already fixed it / that can’t fail / it can’t be that easy / etc”.

    At one job, I inherited a computer that wouldn’t boot.  Spent all day with in-house IT thinking there was a power supply problem, because I’d had that issue before on a couple other computers at home.  When I finally cracked open the case, I found it was missing all its RAM and the CPU.

    1. As soon as you find yourself thinking, “But it HAS to work!” with software, it’s a clear message that you’re assuming something that’s wrong.  

      1. Even working as a tester for several years, it’s hard to overcome that instinct.  Sometimes it’s due to a myopic focus on the “expected” problem, when the real issue is somewhere else.

        Other times, so many things are broken it’s tempting to just throw it back and ask for a working version. Gotta have a real good reason to do that, though :)

  18. I’m thinking back to May 31, 2011, when NPR’s Talk of the Nation was covering the initial report on the crash. They brought in John Cox (“airline safety consultant and former commercial pilot”) to talk about flying one of these behemoths. Can’t find audio of the broadcast, but the gist of Cox’s remarks was that once one of these planes starts falling in a stall, there’s really no recovering. Lottsa callers to the show were small and midsize plane pilots (one- and two-engined jobs, like Cessna 208 Turboprop or Gulfstream G200 twin-engine jet), and to a one they all expressed amazement that Flight 447’s pilots hadn’t simply pointed the nose down to overcome the stall….

    Cox had to drive the point home several times: once a jumbo has stalled, it’s too late to do anything to fix it. In short, the only way to recover from a stall is not to stall. (a la W.O.P.R.’s, “The only winning move is not to play.”)

    I haven’t followed up on the matter, so maybe other experts have other opinions, but it stuck in my mind that fatal flaws can be the result of conscious design decisions, from which chaos can make what it will, with no successful human intervention possible; think Pinto gas tanks or early Soviet space capsules.

    (My goodness….Disqus is certainly being rude today)

  19. Perrow’s book, and Scott Sagan’s companion book “The Limits of Safety” have been getting a good going over among economists these days, most impressively by Andrew Lo at MIT.  Normal Accident Theory has a lot of application in the financial world, where systems are so complex that no one can possibly observe or understand it all, couplings are unforgivably tight, and transactions are placed by the millions by fallible human decisionmakers and overseen by overwhelmed and out-of-touch regulators.  My copy of Perrow’s dog-eared, tape-flagged book has been on my desk since 2009.  Reading that book on a plane is a bad idea.

    I’m afraid that a slow-motion normal accident is taking place in Europe right now.

  20. If you don’t plan for failure you plan to fail!

    A lot of the problem is that *all* systems (at least until we hit the Robotic Rapture) are cybernetic systems: they have a metal and a flesh part.  However, because they’re largely designed by people who obsess over the metal part and who religiously and willfully ignore the flesh part they tend to have these fatal blind spots.  The people involved are assumed to be perfectly rational beings who would never, say, hallucinate under stress: in effect the sort of characterisation that even Marvel Comics has moved beyond.

    It’s the new version of what always happens with the pre-computer paper IT systems.  Something bad happens.  So an administrative procedure is put into place to make sure “that never happens again”.  Eventually you get 300-item checklists that everyone ignores and incredibly complex committee structures that spin wildly out of control the first time anything even slightly out of the ordinary happens.  I have a friend who basically makes a living out of freaking out various government administrations this way.  The people involved have no idea why they’re cutting her a cheque and it doesn’t make any sense but “policy” leaves them no choice.  See Grandin Temple’s seminal work on how slaughter house safety policies really work for other great examples of this.  Ironically it’s an insight that seems to be much easier for autistic people to have than for neuro-typicals.  Poor “theory of mind” I guess. We, on the other hand, know from the start that people behave as chaotically and unpredictably as any machine.

    In the old systems it’s the accountants, as opposed to the programmers, who hate the idea of having people actually make decisions and taking responsibility for them.  All reality must be contained in the Sacred Books of Accounting.  The classic example is the accounting wonk who decided to replace a system of cheques with a direct deposit system for student housing.  The fact that the cheque system was specifically put in place to make the students show up at the dorm office at least once a month was beyond his comprehension.  Without the safety check of the cheques there were then several suicides and overdoses that wouldn’t have happened under the old system: but $3 per student per month was saved!

  21. But what happens if what’s actually going on doesn’t mesh with your training? Then the brain finds ways to make it mesh. Those rational explanations might make a whole lot of sense to you, in the moment. But they will lead you to make mistakes that exacerbate an already growing problem.

    Sounds a lot like spending $57B on useless terrorist protection….

  22. For me the biggest takeaway from the Popular Mechanics article was the flawed Airbus systems. While the pilots did make mistakes, the Airbus system designs amplified and/or led to them.

    For example, why would there be an audible and persistent stall warning if the plane’s “normal law” would never let a stall happen? It seems like the warning should have been reserved for “alternate law”. That created a situation where a so-called warning was actually meaningless and increased confusion and stress.

    No feedback between the yokes seems like another major design flaw. And shockingly there wasn’t any mention of an overt indicator that the plane switched between normal and alternate law. It could have been something simple yet obvious, like cockpit lights changing to alternate colors. Think submarine movies where the control room lights switch to red during combat mode.

    I’m hoping Airbus has fixed these flaws. This crash does show how carefully thought out every system needs to be in all conditions.

    1. It could have been something simple yet obvious, like cockpit lights changing to alternate colors.

      Ha! I was thinking the exact same thing while I read the article.

  23. The Popular Mechanic’s piece is an excellent article. I’ve read some a books about interface design on planes. The fact that there is a dual control of the yoke did more than just give pilots the ability to see the other’s movements, but to feel them physically when your eyes are focused on the various instruments.

     As the PM’s writer Jeff Wise obviously knows, the “tunnel effect’ where your perceptions narrow, would perhaps kick in in this situation. Robert could technically probably see Bonin’s movements of pulling the stick back but with narrowed perceptions and a focus on the airspeed indicators, he may have lost that visual input. But with a dual yoke system he would still feel it and not have to look to know Bonin was doing the wrong thing.

    What I like about the study of this tragedy is that people are looking a the whole system so they can deal with both design issues, technology issues and human issues.

     Think about how many times when there are situations where the wrong input or the misunderstanding of the input leads to tragedy.  If you can fix those they you still have to deal with when people irrationally (or willfully) do the wrong thing.

    Bonin’s irrational choices were not fed to either Robert or the Captain until too late.  Failure in communication, via the yoke system or words, is also a big part of the problem.

  24. I think the human brain has a hard time switching from one type of specific incident having different actions in different scenarios.

    In the air, what do you expect? Is the pilot supposed to know how to immediately repair every minute failure of the aircraft? A pilot is not an engineer or mechanic, he’s a pilot. 

    How many survivable mechanical failures occur, what are the numbers? How do we plan on training every pilot on how to repair every on conceivable  unforeseeable mechanical issue, especially if they may not happen in his/her entire flight career. That’s why we have “preventative maintenance”. The inspections/repairs are the only realistic safeguards for any mode of travel. 

    Let us take “Running Out of Fuel” (highest probability)
    Landcraft: You come to a gradual rolling stop.
    Watercraft: Climate an issue, drift, grounding, sinking,  drop anchor/parachute anchor if possible,  rescue may not be immediate. (preparation for these scenarios are law)
    Not all Aircraft responds the same: 
    >Jet: Crash or glide to a safe landing
    >Rocket: Catastrophic
    >Helicopter: Catastrophic
    >Blip/Balloon: Slow decent, landing spot an issue, hard landing. The automobile industry realizes this too.

    My Dodge’s p0455 (cycle key a few time when check engine light come on) directs me to a few possibilities:  ( per http://www.obd-codes.com/p0455 ) Evaporative Emission Control System Leak Detected (no purge flow or large leak). This indicates a fuel vapor leak or lack of purge flow in the EVAP control system. It means a large leak has been detected. The (EVAP) emission control system prevents the escape of fuel vapors from a vehicle’s fuel system. Fuel vapors are routed by hoses to a charcoal canister for storage. Later, when the engine is running a purge control valve opens allowing intake vacuum to siphon the fuel vapors into the engine.

    With a p0455, the most common repair is to: Remove and reinstall the gas cap, clear the codes, and drive for a day and see if the codes come back. Otherwise, replace the gas cap, and/or Inspect the EVAP system for cuts/holes in tubes/hoses (you may hear a vacuum noise or smell fuel), repair if necessary. (it’s always been the cap)

    My Dodges p0455 code is somewhat of a luxury code, I have time to address it later and can continue using my vehicle. 

    Any kind of plane cannot operate with this type of warning system. So it has to be specific because certain malfunctions have  to be attended to immediately. 

    Ever take anything apart, put it back together and have one screw left? If it’s a plane, you’re not taking off.

  25. The only thing any pilot needs in a Commercial Airliner to avoid these accidents is an Angle of Attack Indicator. Any Fighter Pilot who walks onto the deck of a Commercial Jet will immediately say,  “Where TF is the AOA indicator, are you guys insane?”

  26. There is no such thing as a risk free day on earth and there never will be ~ regardless of the science breakthroughs.
    An 95/05 chance of living is probably an excellent day. We normally unknowingly operated most days with less.

  27. It appears that many incidents are related to airspeed and attitude.  Given the use of glass cockpits these days perhaps two simple indicators might aid in these situations. 

    1. Attitude indicator : Like the Russians use on Soyuz spacecraft as a backup. Simply a mass (mascot in the Soyuz case) on string in the cockpit.  If it points down you are level.  Leans back you are climbing ,leans forward = decent, sits on the roof = you are inverted! The plane equivalent to fluffy dice.

    2. Speed indicator: Simple deforming spring outside the pilot and co-pilot’s windows that deforms visibly with airspeed.  Some funky colour visible day/night. Not hyper accurate given its position on the surface but enough to give a rough order of airspeed.  The pilots with time will associate its behaviour with the known airspeeds they use when the systems are all working well. When they disagree the pilot will notice. If the cockpit goes black it still works.

    This way you have a non avionic backup indication of these two things to refer against when all the alarms and buzzers go off.

  28. First, this was a gripping article I strongly suggest you read.   

    What struck me in the article was the asynchronous “joystick” / yoke controls.  With one pilot pulling back the other stick did not have the same reaction (as we would expect having watched dozens of airliner movies over the years).  This lack of a “tactile” alert could have been the difference between life and death.  Had the other pilot noticed “which is this control stick all the way back” the corrections could have taken place far earlier.
    I think about this in more advanced systems–sort of that “last chance” kind of option that defies computer programming and is something rendered in an old fashion, more blunt force way.  Anyone who has driven in mountains often notices pulls off filled with gravel (runaway truck stops).  A “trip cock” on a train system that engages the brakes should a signal be passed.  Sometimes a simple option isn’t necessarily a bad option.

    Anyway, a very important read.  Very tense.  

  29. Reminds me of “Up Against It” written by M. J. Locke.
    The Life-System-AI becomes a threat to the population when it tries to save a life

  30. At the risk of oversimplifying, who would ever let out more string when a kite is dropping? That’s apparently the nosing-up pilot error in 447 and Colgan Air 3407 (approaching Buffalo a few years back). Both cases involved extreme weather and autopilot, coupled with “incomprehensibility.” Right there, isn’t that your use case for an algorithm to predict a stall scenario, then yank in the string? . . .

  31. 1) Why isn’t airspeed data from pitot tubes checked against GPS?

    2) Why aren’t the pitot tubes thermostatically heated to prevent icing over?

    1. 1) GPS measures groundspeed; pitots measure airspeed. There’s a big difference – at 35,000 feet, you can have airspeeds in the hundreds of knots. I recall seeing a picture of an airliner, with the console showing a groundspeed higher than the speed of sound; similarly, in a small, light aircraft in the right conditions, it’s possible to have zero groundspeed – or even fly backwards!

      2) The pitot tubes are heated – but the ones on AF443 were a bad design (search for “thales pitot” for more info). They were supposed to be replaced at the next major maintenance check – but that didn’t quite happen.

  32. Another good read for you to consider: To Engineer is Human by Henry Petroski. Examines a number of engineering disasters through history to draw the following thesis/conclusion: We build stuff, we over-build it for safety reasons, it works, we build the next thing a bit more complacently and a bit more cost conciously… and eventually we have another disaster and then vow to overbuild again.


  33. Confusion in the cockpit and loss of situational awareness is often a factor in crashes, as chillingly shown by transcripts in the horrifying yet fascinating The Black Box by Malcolm Macpherson.

    Petroski’s book is good too.

  34. The lesson here is in the very first line of the  synopsis; “At 1h 36m, the flight enters the outer extremities of a tropical storm system. Unlike other planes’ crews flying through the region, AF447’s flight crew has not changed the route to avoid the worst of the storms.

    Basic rule –  avoid bad weather, don’t needless test your aircraft/crew system.

  35. I think this boils down to over-reliance on the proverbial technology crutch. We delegate more and more tasks to computers, without properly training humans how to react when these crutches malfunction. Worse, the mere possibility of a malfunction becomes taboo, because it is assumed that the mere mention of this would result in stress.

    This is not limited to planes: now you see cars with HUDs, systems that automatically turn on blinking lights when you change lanes, collision radars and so on. 

    The other day I was watching a documentary about the Lofoten Islands, in Norway, one of the most dangerous coast there is (that’s where you find the infamous Maelstrom). There are many lighthouses there, that are not permanently manned anymore. People go there regularly to check on systems and one was asked whether this whole setup is necessary anymore, what with everyone using GPS. And the guy answered: “And what happens when you’re at sea, at night and during a storm, and your GPS breaks down?” Because it WILL break down, usually at the worst moment. And the scariest part is, most of the people are so used to their GPS they mightn’t be able to navigate by view well enough anymore…

Comments are closed.