/ Maggie Koerth-Baker / 8 am Wed, Apr 17 2013
  • Submit
  • About Us
  • Contact Us
  • Advertise here
  • Forums
  • Why is it so hard to make a phone call in emergency situations?

    Why is it so hard to make a phone call in emergency situations?

    When bombs exploded at the Boston Marathon on Monday, my Facebook feed was immediately filled with urgent messages. I watched as my friends and family implored their friends and family in Boston to check in, and lamented the fact that nobody could seem to get a solid cell phone connection. Calls were made, but they got dropped. More often, they were never connected to begin with. There was even a rumor circulating that all cell phone service to the city had been switched off at the request of law enforcement.

    That rumor turns out to not be true. But it is a fact that, whenever disaster strikes, it becomes difficult to reach the people you care about. Right at the moment when you really need to hear a familiar voice, you often can't. So what gives?

    To find out why it's frequently so difficult to successfully place a call during emergencies, I spoke with Brough Turner, an entrepreneur, engineer, and writer who has been been working with phone systems (both wired and wireless) for 25 years. Turner helped me understand how the behind-the-scenes infrastructure of cell phones works, and why that infrastructure gets bogged down when lots of people are suddenly trying to make calls all at once from a single place. He says there are some things that can be done to fix this issue, but, ultimately, it's more complicated than just asking what the technology can and cannot do. In some ways, service failures like this are a price we pay for having a choice and not being subject to a total monopoly.

    Maggie Koerth-Baker: The problem of not being able to reach loved ones on the phone during an emergency isn't exactly new, right? Land lines had to deal with this, as well. Just to refresh our memories, what happened when land lines got congested with call traffic?

    Brough Turner: Well, say you'd have an earthquake in California. This was for the old Bell system. The national long distance routing has a set of standard, predefined routes and it had network control centers in New Jersey and other places. Things would get overloaded and they would manually intervene by putting access restrictions on new calls coming into the area that was congested. In the 60s, 70s, and 80s they would let through one out of every five call attempts. They were doing that manually and just arbitrarily to reduce congestion. Over time things got more automated. During the long-distance competition of the 1990s, AT&T introduced computerized routing and started using automated rate limiting. It all really got quite sophisticated before the whole industry went away.

    MKB: What about with cell phones? We aren't talking about wires anymore, so what's really going on behind the scenes when we say that the phone network is congested?

    BT: First off, different cell phone providers use different technologies, different systems. I'm talking about the GSM system used by AT&T and T-Mobile. I know less about the Qualcomm version that's used by Verizon and Sprint. They evolved in different ways and the details are different, but the same basic principles are the same for all. With 4G, by the way, that's changing. Everybody is converging on the technology that comes from that GSM tradition.

    In general, though, there are a bunch of different places where congestion can happen. Networks consist of different technologies, and different levels. You start with the mobile switching center that may cover a large area. There are only one or two mobile switches for Eastern Massachusetts. We're talking about a room full of racks, full of computers and other switching elements. The densest switch is in China, and they have something that will serve more than several million customers at a time.

    So you have the mobile switching center. Then you have groups referred to as radio node controllers. There are dozens to hundreds of these conrolled by one switch. They're located closer to the radios and they deal with handoffs between different radios.

    Then, of course, you have the individual radios and that's where you see antennas on top of and on sides of buildings. Those are everywhere. Each of those is a cell, and in each cell you have users who are connected to the network.

    MKB: So this is really about how you, as a cell phone user, move around a physical area? You get handed off from one radio to another, from one node controller to another, and as you travel a lot farther, from one switch to another?

    BT: Yup. The other thing about the radios is that they have different sizes of cells. You've got regular cells and then smaller sub-cells. You also have larger overlay macro-cells that are really big. They try to handle you within the small cell you're closest to. But it's a trade off between capacity — they'd like to have lots of small cells for that — and coverage — they don't want to put 100k small cells everywhere. So you might have a cell that covers a mile ara and then smaller cells within that that handle most of the traffic.

    Interesting thing is that most people are actually stationary, sitting on their butts. For most people, calls originate from one or two locations and they stay there the whole time. But we have to have this incredibly complicated system to deal with the 5-8% of people who move around. Maybe less than that.

    MKB: So what happens when you suddenly get a lot of calls happening within one cell?

    BT:They can offload some of that to a macro-cell. When it's a planned event — the Boston Marathon, for instance, before the bombings — they can bring in aditional mobile cells. They park little trucks around the edge of the event. All those radios, though, have to connect back to the radio network controller. If it's an installed radio it's probably a wired connection — copper or fiber. But when you can't get that, then they use point-to-point wireless. Either way, they call that the backhaul.

    In different parts of the system different things will get congested. In some cases, the specific cell site might be overloaded and macros are also overloaded. In other cases, it's the backhaul that gets overloaded. And that doesn't even have to be an emergency to cause that. There's this great story where [telecommunications expert] David Reed was driving from New York to Boston in the middle of the night. His wife was driving and he was sitting there with one of the first iPads that had 3G service, and has they drove through Connecticut he was running speed tests along the way. Just to see the different responses in different cells. And at one point, he was limited to, like, 3 mbps. It was 3:00 am, so it wasn't about lots of people using the system. It was just that he was driving through a cell where the only backhaul was two T1 lines. So 3 mbps was the maximum anybody in that cell could ever get. And this was like a 20 mile stretch of highway.

    MKB: So there was only so much information that could go in and out at a time. Wow. I know that channels, the actual wireless signals from and to your phone are also important. Can you talk about those?

    BT: There are a bunch of separate channels in the wireless system. But the big division is between a control channel and all of these traffic carrying channels. Control channels are used for a lot of different things. For instance, they're used for call set-up and call tear-down. Your handset looks on a particular control channel for permission to make a request. It uses the control channel to request to make a call, like, "I need enough capacity to set up call," so then the system can find the traffic channel with enough free space. But they're also used for sms messages. Which is interesting.

    MKB: Yeah. I've heard that, when you're in a situation where lots of people are placing phone calls, it's often easier to get a text message through. Is this why? And, if so, is it a good way to use the system? What I mean is, is the system as a whole better off if you text your friend in Boston to check in, rather than trying to call him?

    BT: Yes. It's much better. The SMS messages have a relatively light footprint, first of all. The second thing is that they're asynchronous. If they can't get through this instant, they keep trying. If it gets over the radio to the cell site, it will get through. Even if it's delayed for 30 seconds or something. With voice you're either connected or you're not, and when you are that means that the traffic channel is tied up until you're done talking. More likely, it means you never get connected because traffic channels are already saturated.

    MKB: In an emergency, can the cell phone companies limit access to the network the same way the Bell system used to do with land lines?

    BT: Yes. Now this is a piece where I know what equipment these large carriers have, but I don't know how they've chosen to implement capabilities that are there. So one way they can do this is they can bar new traffic being originated by people based on "class". There are typically 10 classes for regular subscribers and another six classes that handle things like 911 calls and emergency services. They can control which classes have access at the level of cells, or by groups of cells, or all of Eastern Massachusetts if they wish.

    I'm not clear on how automated all of this is. They definitely have the ability to have it totally automated. There's technology you can buy from Ericsson that features call-load-triggered access class barring, so it automatically invokes certain policies about who can place calls in an area if the traffic there exceeds a pre-determined threshold. But that's an extra feature and you have to pay extra for it ... I guarantee it's in the range of 10s of thousands of dollars per mobile switch. So who knows what decision the carriers made about that. It might have been automated and it might not be.

    What I am sure of is that they set up priorities for people with fire and safety access classes. And I think it's also clear that the Verizon mobile switching center was overloaded on Monday. The effect I observed in Massachusetts was you could not place a call from a landline into the Verizon mobile network for some period of time. They blocked all incoming calls for some period of time. But within the network [Verizon to Verizon] some number of calls were getting through. I didn't succeed, but some friends did after trying for 5 or 10 minutes. In overload cases they won't turn off everything. They'll say fire and safety get through immediately and maybe 10% of the other calls get to go through. They don't throttle down to zero, though, because you don't know if somebody desperately needs to make that connection.

    MKB: Is this an issue that can be fixed? In some of our background conversations before this interview, I got the impression that this isn't all about what the technology can do, but also what companies do with it. That there's a lot of trade-offs people make and congestion like this during emergencies are one of the side-effects of those trade-offs.

    BT: In the end, it does come down to trade-offs. That's true of any network. You're interested in coverage first and then capacity. If you wanted to guarantee that a network never had an outage your capital investment would have to go up orders of magnitude beyond anything that is rational. So each network is trying to invest their budget in ways that make network appear to perform better.

    The cost of providing temporary extra capacity for the Boston Marathon, that's something that's in the budget and they plan for that event. But when you get something unexpected like a terrorist event, or an earthquake, or damage from a hurricane or tornado, then you have trade offs between capital and how robust your network is. Every time you have an event people say, "Oh, they didn't invest enough." But you look at New York City after Hurricane Sandy and Southern Manhattan was under 6 feet of water — all the buried infrastructure was lost. Meanwhile, in other places, a significant number of cell sites were knocked out because connections ran on overhead poles and got knocked down by trees. The antenna site literally got destroyed. Interestingly, you can lose 30% of your cells and stil get coverage. Coverage was there in New Jersey after Sandy, even with 1/3 of the network out. The catch is there wasn't much capacity.

    MKB: Are more robust networks something that could be regulated? I ask because I've gotten the impression that some people are concerned that when cell service is congested during a disaster, there will be a cry for the government to do something ... and the unintended effects of that would actually leave us with a cell system that we maybe don't want, something that gives a few corporations a lot more power.

    BT: I honestly don't know how you could regulate it to work the way you wanted it to all the time. Reliability on the old Bell system was relatively high ... and we paid the a high price for that as consumers because to get that level of service they got to be a monopoly and they got to charge us a rate that allowed them to make a return on their investments.

    With cellular systems, competition seems to drive more optimal decisions. We don't have as much competition as we used to, but there's still some. You really want at least four-to-six carriers, and most places it's really only like three or three-and-a-half. For the public, we have to have a trade-off between getting coverage we want and being stuck with a monopoly. You look at electricity or fixed-line phone systems, and there are regulations on those industries about how much coverage and capacity they have to have because it has to be a good system — you as the consumer have no other choice. They're monopolies.

    Image: ~ Timepass ! ~, a Creative Commons Attribution No-Derivative-Works (2.0) image from neokratz's photostream

    / / COMMENTS

    / / / / / / / / / / / /


    1. This is why:

      “If you wanted to guarantee that a network never had an outage your
      capital investment would have to go up orders of magnitude beyond
      anything that is rational.”

        1. Well, yeah! I was thrilled to find a copy of “A History of Science and Engineering in the Bell System” at my local public library 20 years ago, and checked it out many times to read it cover to cover. 

      1.  Rational nothing, cost effective to a private company aiming for dividends for its shareholders however…

    2. The same effect can be seen at Midnight on New Years Eve as the world and his wife tries to wish friends and relatives happy new year. Or on a large festival site when a Band finishes on a stage and loads of people try to hook up with their pals.

    3. In other words, the free market cannot provide telephone service in an emergency. That’s all there is to it. A government regulated monopoly could, but that, like telephone service in an emergency, is against modern American ideology.

      1. The old Bell telephone system was very good, being engineered to handle Mothers Day traffic volume. A long distance call cost 40 cents a minute 30 years ago. It was a dollar a minute in the twenties, when a person of the working class earned a dollar or two a day.

        1. One of the reasons why the Bell telephone system could handle that peak Mother’s Day traffic was *because* it cost 40 cents per minute. Had the price been more in line with the rates of today, then demand would have risen beyond the network’s capacity.

          1.  Landline phones would saturate the system if 12% of users decided to use it at the same time. I worked on the Autovon (military) system, which could handle 100% of the users, but at very high cost.

      2.  Capacity is never infinite.  Back when there was only one telco, you got stuff like this too.  That’s why there are override codes for cell phone systems that emergency agencies can use to get access.  Assuming you didn’t lose the local cell towers in the crisis zone.

        No matter how many circuits you put in they will get overloaded, in a crisis people will insist on calling Aunt Mamie and yapping about it endlessly even when officials are begging people to stay off the system so real coordination and work can be done and those who might be trapped under buildings etc (depending on the emergency) can get a damn call for help through instead of dying.

        Ever watch a fire truck or paramedic ambulance desperately trying to get through busy streets where everyone has the “be damned to your emergency, *I* will proceed through this intersection and that fire can bloody well wait” or force it’s way through a mob of rubberneckers at a major situation who by God will *not*  move even if people are dying?

        As long as we continue to breed stupid people and excuse them personal responsibility for their actions, we will have these issues.

    4. I really wish service providers made this information common knowledge — and if not the service providers, then the federal government (i.e. PSA’s about the importance of texting rather than calling during a citywide, statewide, or nationwide crisis). 

      1. Yes, it’s not common knowledge. People just assume that the system has infinite bandwidth, when it’s actually pared to the bone for profit margin reasons.

        I participated in a fun art project at Coachella last weekend. We had an absurdist “Coachella Power Station” set piece with smokestacks, satellite dishes, Tesla coils and a fancy control room staffed by… hippos.

        The cell phone service at the venue was phenomenally good considering the number of
        people there, but no electrical outlets to charge phones. I built a FREE cell phone charging station positioned out front. The charging circuit was designed to be weak, so that it could detect too many phones charging at once and trigger an alarm to distract and confuse the hippos. It took about 4 hourse to charge an iPhone. (I could have just put a dozen cube USB charger in power strips, but this was not a public service, it was art.)

        The end result was that a gaggle of college girls was always stuck in front of the phone charging station, waiting hours for their phones to charge, being subjected to hippos doing crazy stuff right before their eyes.

        Moral? You get what you pay for.

        1. I saw a photo of that, including the gaggle of college girls, but didn’t know about the hippos etc. Very cool to hear that you were behind it (and to hear the details).

          I remember when I went to Coachella in 2010 there was no place to charge besides a small outlet array (with a handful of outlets at most) in the campground, which had an absurd queue considering how long each person would be needing to charge (I just left my phone off most of the weekend).

          1. When I moved to this area, if you drove down to where Coachella is held, you would have seen a line of people sitting in the dirt by the side of the road, waiting for the payphone.

      2.  Here in areas affected by the storm Sandy it became apparent.  Many cell towers lost power thereby deactivating themselves.  People were forced into the realization that text messaging works when voice doesn’t.

      3. On the topic of texting, yes and no. At least on the older GSM setups texts are delivered on the cells control channel, the very same channel used to set up and tear down incoming and outgoing calls. This meant that you could in effect drown a GSM cell via text messages but anyone holding a ongoing conversation (handled by different radio channels) would not notice the issue.

        With the introducing of GPRS however, GSM phones can send and receive texts as data traffic on the same channels that voice traffic use. If you have a older phone, or stumble into a old GSM cell, your texts may still be handled on the control channel however.

    5. Very nice interview;  Brough Turner is spot on.  An excellent review of cell phone capacity vs coverage.
      In 4G/LTE networks, a typical antenna (called an e-nodeB) can handle about 300 to 800 cell phone conversations at any instant.  This is typically divided into 3 sectors, each with a capability of 150 to 200 connections.  These numbers depend greatly on how much bandwidth the carrier “owns”, as well as backhaul bandwidth, and local geography. 

      As Brough mentions, cell phone carriers will bring in mobile cell phone sites for special events.  These are called COW’s: Cell on Wheels, or sometimes CIAB (cell in a box) or COLTS (Cell on light trucks).  Often, they’re generator powered; you find them at outskirts of stadium parking lots.
      Curiously, the major problem with cell phone coverage is usually not signal strength, but rather interference.  Making signals stronger causes more interference; adding more COWs must be done carefully, or the footprints of their signals overlap with each other.

      Previous LTE standards mainly were to get the system to work. Now that these 4-G systems are successfully deployed, new LTE standards (under the 3GPP consortium) increasingly are directed towards emergency services, earthquake/tsunami warnings, non voice emergency services.  There’s also a good deal of work done on locating an individual cell phone, and to “lawful intercept” operations. (Precisely locating a cell phone, by the way, is surprisingly difficult without GPS)

      As Brough Turner points out, a major crowd or serious disturbance is almost certain to overload a cell phone system; long ago these would result in lines outside of telephone booths or complaints of “no dial tone”.  Even if a disaster doesn’t cause any cellphone towers to fall, the system will be overloaded by calls.  Those of us in earthquake country should assume that our cell phone service will be spotty (at best) in the days following a quake.

    6. I worked for a CDMA carrier in Florida for many years. Let me expand a bit.

      If incoming calls are experiencing call blocking (ie, fast busy signals), it is not something that is done automatically, but something that happens because of capacity. Carriers have dedicated lines to various points. There are lines to 611 (customer service), lines to 911, lines to long distance carrier, and lines to other cell phones on the same network, and lines to the local wireline carrier (aka, telco). Every call that you make to another wireless provider has to go through the local telco. The telco is the middleman that all the providers use to talk to each other. So if you are Verizon calling ATT, you have to go through the telco. If you are a landline customer calling a VZ cell phone, you have to go through the local telco.  Those lines have the greatest capacity, but there is certainly a limit to how many calls can go in or out through the telco to your carrier. That’s why you can generally call your husband on your account when there’s an emergency, but can’t call home. That’s also why you can usually dial 611 to complain about the fact that you can’t call home, because that capacity is separated.

      I know with Lucent switches that particular numbers could be designated as high priority, meaning that once that person hits send, if there was a capacity issue someone else would have been booted off the network. I am not sure if other types of switches are able to do that.


      1. In GSM, the concept of an emergency telephone number that takes priority is baked into the system, so dialling 112 while connected to a cell at capacity will indeed disconnect someone. The networks *should* configure other locally equivalent numbers (so 911 in the US, 999 in the UK) to act in the same way, sometimes other common numbers work too (so I believe dialling 911 in the UK will get you through to the 999 line just because there may be kids who have not been taught about the correct UK numbers, but may be familiar with 911 through exposure on the TV – not sure if this is at network or handset level). You also should be able to dial these numbers even without unlocking the handset (where a keypad is available)

    7. What’s missing from the original article is an explanation of how phone system capacity is designed. For decades, going way back into the Bell System days, portions of the phone system are sized so that the likelihood of a call going through is high under any normal circumstances. Making the cost of the infrastructure commercially viable depends on the fact that people make calls at different, random times. Look up “queueing theory.”

      If 99% of calls go through with transmission capacity available for only 30% of the total number of phones in an area, that’s fine.

      The trouble in an emergency is that the normal phone traffic patterns go out the window. Everybody wants to know people in the emergency area are OK, by calling to or from that location. So the demand for phone calls goes up far beyond anything considered normal.  Boosting capacity to handle the worst case scenario would require huge investments. In this regard, cell phone service is not much different from the landlines of many decades ago.

    8. There’s also an ability for emergency responders to gain priority in both the wired and wireless telephony system. Not for nefarious purposes, or because they are #1, but because you’d rather have the police, fire and ambulance to have a priority on cellular service in the event of a broad disaster.

      1.  Or business users?
        Our company have a number of SIMs with elevated priviledges for this purpose which we bought after 7/7.

    Comments are closed.