Blackout: What's wrong with the American grid

It began with a few small mistakes.

Around 12:15, on the afternoon of August 14, 2003, a software program that helps monitor how well the electric grid is working in the American Midwest shut itself down after after it started getting incorrect input data. The problem was quickly fixed. But nobody turned the program back on again.

A little over an hour later, one of the six coal-fired generators at the Eastlake Power Plant in Ohio shut down. An hour after that, the alarm and monitoring system in the control room of one of the nation's largest electric conglomerates failed. It, too, was left turned off.

Those three unrelated things—two faulty monitoring programs and one generator outage—weren't catastrophic, in and of themselves. But they would eventually help create one of the most widespread blackouts in history. By 4:15 pm, 256 power plants were offline and 55 million people in eight states and Canada were in the dark. The Northeast Blackout of 2003 ended up costing us between $4 billion and $10 billion. That's "billion", with a "B".

But this is about more than mere bad luck. The real causes of the 2003 blackout were fixable problems, and the good news is that, since then, we've made great strides in fixing them. The bad news, say some grid experts, is that we're still not doing a great job of preparing our electric infrastructure for the future.

Let's get one thing out of the way right up front: The North American electric grid is not one bad day away from the kind of catastrophic failures we saw in India this week. I've heard a lot of people speculating on this, but the folks who know the grid say that, while such a huge blackout is theoretically possible, it is also extremely unlikely. As Clark Gellings, a fellow at the Electric Power Research Institute put it, "An engineer will never say never," but you should definitely not assume anything resembling an imminent threat at that scale. Remember, the blackouts this week cut power to half of all Indian electricity customers. Even the 2003 blackout—the largest blackout in North America ever—only affected about 15% of Americans.

We don't know yet what, exactly, caused the Indian blackouts, but there are several key differences between their grid and our grid. India's electricity is only weakly tied to the people who use it, Gellings told me. Most of the power plants are in the far north. Most of the population is in the far south. The power lines linking the two are neither robust nor numerous. That's not a problem we have in North America.

Likewise, India has considerably more demand for electricity than it has supply. Even on a good day, there's not enough electricity for all the people who want it, said Jeff Dagle, an engineer with the Pacific Northwest National Laboratory's Advanced Power and Energy Systems research group. "They're pushing their system much harder, to its limits," he said. "If they have a problem, there's less cushion to absorb it. Our system has rules that prevent us from dipping into our electric reserves on a day-to-day basis. So we have reserve power for emergencies."

None of this means the North American grid is a perfect, or even an ideal, system. The electric grids that exist today evolved, they weren't designed by anybody. Every electric grid on Earth is flawed, but they're all flawed in different ways. So we can talk about serious problems with the North American grid—but that doesn't mean that you should be stocking up on home generators and canned peas in preparation for an India-like event. The scale is different, and the problems are different, too.

All the Small Things

So what did cause the 2003 blackout? There were a couple key issues, but at least one is likely to surprise you. First Energy, the conglomerate that owned both the broken generator and the failed alarm system, had also been lax on trimming trees near their power lines. It's an amazingly simple, non-techy, problem, but it mattered.

I like to say that the grid is a lot like a lazy river at a waterpark. It's not a line, it's a loop—power plants connected to customers and back to power plants again. And like the lazy river, it has to operate within certain parameters. The electricity has to move at a constant speed (an analogy for what the engineers call frequency) and it has to flow at a constant depth (analogous to voltage). In order to maintain that constant speed and constant depth, you have to also maintain an almost perfect balance between supply and demand … everywhere, at all times. So when one generator goes out, the electricity it was supplying has to come from someplace else. Like a stream flowing into a new channel, the load will shift from one group of transmission lines to another.

But, the more electricity you run along a power line, the hotter the power line gets. And the hotter it gets, the more it droops, like a basset hound in a heat wave. If nearby trees aren't trimmed, the lines can slump too close to the branches—which creates a short circuit. When that happens, the loads have to shift again. All of this disrupts the speed and the depth on the river of electrons. The more lines you lose, the more likely it is that the remaining lines will, themselves, droop into something. The more lines that short, the more power plants have to shut down to protect themselves from fluctuations in frequency and voltage. The more times you have to shift load around, the more the grid starts to get away from you. In 2003, six transmission lines went down in a row, several of them major channels for the flow of electricity. Those losses were what turned a small series of mistakes into a catastrophe.

A Failure to Communicate

Even more important than the untrimmed trees, though, was the lack of communication.

The North American electric grid is a patchwork quilt, not a single entity. It's made up of chunks controlled by different—and often competing—utility companies. Those chunks are aggregated into management districts. In the case of the Eastern part of the continent, all of the management districts are aggregated into a larger joint district. There are a lot of hands working to make sure the grid operates the way it should. But those hands don't always know what the others are doing, at least not fast enough.

The issue is something that grid experts call situational awareness—basically, the big picture. In 2003, the people trying to stop the blackout didn't have a clear view of it. Partly, that had to do with the faulty software program that wasn't turned back on and the alarm system failure that apparently went unnoticed. But it was also just how the grid worked. The systems in place to tell grid controllers what the electrons were doing moved a lot more slowly than the electrons themselves.

In 2003, it took about 30 seconds for data about what was happening on the grid to be gathered, compiled, analyzed, and displayed in a way that grid controllers could use. That sounds pretty fast, until you consider the fact that changes on the grid happen much, much faster***. If a power plant goes offline in Arizona, it can create a measurable effect in Canada in about a second. If your view of the grid is updated only every 30 seconds, you miss important details. After the 2003 blackout, grid experts went back and essentially replayed the whole thing in a computer modeling program. The idea was to try to get a better idea of where things went wrong and how a similar event could be prevented in the future. They found that, about an hour before the blackout, the grid was showing signs of stress that controllers didn't see at the time, said Carl Imhoff, manager of the Energy and Environment Sector at PNNL. It wasn't the controllers' fault. They simply didn't have the technology to see the big picture.

Fixing the Grid

Today, that technology exists. Phasor Measurement Units are kind of the opposite of sexy. Also known as PMUs, they're just anonymous little boxes that sit on server racks in electrical substations. But phasors are linked into transmission lines. They see what's happening on the line—how well supply and demand are balanced, whether voltage and frequency are stable and within the normal range. That's just one point of data, recorded in one place. But a network of phasors can tell you a lot. It can show you, for instance, if the stability of the grid is changing as electricity moves from Cleveland to Columbus. And the phasors process that information far more quickly. Today, our grid can give controllers information about the big picture in less than 10 seconds. Researchers like Massoud Amin are working on getting that response time down to fewer than 3 seconds.

If we'd had a phasor network in 2003, grid controllers would have had that hour warning about the problem. There's a good chance they'd have been able to fix it, or, at least, make the resulting blackout smaller and more localized.

When it comes to PMUs, 2003 was really a wake-up call. It led utilities and the government to team up to install a true phasor network throughout the United States. That effort is currently ongoing. In 2009 there were maybe 200 phasors in operation. By the end of 2013, there will be more than 1000 installed throughout this country. Over the last five years a partnership between federal Recovery Act funds and private industry dollars has invested $7.8 billion in upgrading the grid, Massoud Amin said.

The problem, he added, is that this isn't nearly enough.

Our grid is old. The average substation transformer is 42 years old—two years older than the designed lifespan of a substation transformer. For the most part, our grid hasn't been modernized—it's largely mechanical equipment operating a digital world, Clark Gellings said. Perhaps most importantly, the grid isn't being prepared for the future.

"From 1995-2000, the electricity sector put less than ⅓ of 1% of net sales into research and development," Massoud Amin said. "In the following six years, that number dropped to less than 2/10 of 1%. We are harvesting the existing infrastructure more and investing less and less in the future."

Phasor networks are a success story in the making. So are new national rules Gellings told me about, which put a much higher penalty on utility companies that don't keep their trees trimmed. One untrimmed tree can cost $1 million in fines. All of this will help prevent blackouts of the size we had in 2003. But it doesn't help deal with what's coming 20-30 years down the road.

It's not just that the infrastructure itself will eventually age out. Where we get electricity from, who uses it, and how much we use is all changing. In the future, we're going to have more electricity production happening in the rural Midwest, where wind resources are most abundant, but the people will still live far away. We keep using more electricity, in general, and we're more dependent on it now. We're only going to become more dependent in the future. Jeff Dagle told me that improvements are being made, but they might not be moving fast enough if there's a major change in energy use—for instance, if Americans start buying electric cars at higher rates than they do today.

The frustrating thing is that this isn't simply a technology problem. It's also social and political. Just like the national grid is really a patchwork of grids, it's also a patchwork of regulatory systems. That uncoordinated mixture of regulation and de-regulation often fails to incentivize the investments the grid actually needs. Building transmission lines, for instance, is a job that crosses multiple states. Many of those states aren't going to get a direct benefit from the line, even if that's what's best on the whole. Local regulators may understand that, but when they have to operate in the best interests of their state or county, they might still challenge the line, Gellings said. This is part of why it can take as long as 12 years to get a single new transmission line built. In another example, de-regulation in many states has created a confused system where there are now lots of stakeholders in the electric grid, but nobody has an incentive to think about, or invest in, the long term.

If we want the grid to work as well three decades from now as it does today, we need to put some money into it. Massoud Amin has estimated the cost of grid improvements. To make the grid stronger—adding more high-voltage lines and upgrading the existing ones—he says we need to spend about $8 billion a year for 10 years. To make the grid smarter—digital, centralized, automated, and with the kind of big-picture communication that helps us stop blackouts before they happen—it'll take an investment of $17-20 billion a year for 20 years.

That sounds like a lot of money. That sounds completely undoable. And maybe it is. But Amin says you have to think about what you're saving, as well. Remember how much the 2003 blackout cost us? Most blackouts that happen aren't that big. They're local things, that happen to your neighborhood, or your town, or your county. But they happen a lot. Depending on what part of the United States you live in, the grid averages 90-214 minutes of blackout time per customer, per year*. And that's not even counting the blackouts that happen because of extreme weather or other disasters, like fires. All that downtime adds up. Amin says the average cost is more than $100 billion per year.

And that's the difference between an expense and an investment. Over time, the investment pays for itself.**

*Japan, in contrast, averages 4 minutes of interrupted service per customer, per year.

**Massoud Amin estimates that these investments would save $49 billion a year that would otherwise be lost due to blackouts. The improvements would also make our grid more energy efficient, which he says could save an additional $20 billion annually in energy costs. You can read more about this in the reports he's written about his research.

READ MORE

Learn about how the grid works and what grid controllers do by reading a free chapter from my book, Before the Lights Go Out.

Read the full report on the 2003 blackout

***The original version of this story stated that electrons moved at almost the speed of light. This is a misunderstanding on my part. I've changed the wording to reflect what's really going on.

Image: Untitled | Flickr – Photo Sharing!, a Creative Commons Attribution Share-Alike (2.0) image from krunkwerke's photostream