Prime Suspect, or Random Acts of Keyness

The foundation of Web security rests on the notion that two very large prime numbers, numbers divisible only by themselves and 1, once multiplied together are irreducibly difficult to tease back apart. Researchers have discovered, in some cases, that a lack of entropy—a lack of disorder in the selection of prime numbers—means by analogy that most buildings on the Web would stand in spite of gale winds and magnitude 10 earthquakes, while others can be pushed over with a finger or a breath. The weakness affects as many as 4 in 1,000 publicly available secured Web servers, but it appears in practice that few to no popular Web sites are at risk.

Why primes are useful for security

Web security relies on generating two large prime numbers with sufficient randomness that it is vanishingly unlikely that any two parties in the world would ever stumble across the same number twice—ever. It's impossible to assure, but uniqueness is nevertheless a requirement of the most nearly universally relied-upon algorithm, RSA, to secure Web browser/server transactions via HTTPS.

Here's how it works. Take two big prime numbers that are also similar in size and multiply them:

858599503 * 879190747 = 754872738416398741

The only legitimate divisors (the factors of the product) of such a number are 1, the number itself, and the two primes. The fact that it divides by 1 and itself are obvious. But even using all available mathematical and computational approaches, it takes an inordinate amount of time to tease out the two prime numbers used to create it.

So long as enough entropy is involved, larger primes and thus a larger product—at least 1024 bits long or over 300 digits in decimal, but, as we'll see, far better at 2048 or 4096 bits—result in better security. (The example above, only 23 bits long in binary, would be woefully inadequate, taking perhaps seconds to factor.)

Less entropy means that the same large prime numbers wind up being used repeatedly, which makes finding their factors simple for publicly reachable sites, routers, and other hardware.

What's wrong with RSA

The flaw in randomness that has been revealed has to do with the public part of public-key cryptography; RSA is a public-key algorithm and virtually all public Web servers rely on it. With the RSA algorithm, the two primes are used as part of a method that derives two related exponents—further multiplications of a number by itself. The exponents and the primes' product are used to encrypt and decrypt a message. The product of the primes and the public exponent may be distributed freely without compromising the private exponent (nor the two prime factors, for that matter, which aren't used again). A sender with a recipient's public key sends a message; the recipient uses the private key to reverse the process (using a multiplicative inverse operation, as I just learned).

The weakness found by two different sets of researchers doesn't relate to the algorithm. It's already known that sufficiently small products may be factored to their original primes through the best-known computational method. Products of 512 bits (150 decimal digits) or smaller are essentially broken; 768 bits is within reach, but only with substantial effort.

No, the problem isn't with a new method of speeding up factorization. Rather, as noted early, it's about entropy and the public nature of public keys. Web sites that use SSL/TLS for encrypting connections with clients publish a certificate that the client receives as part of an initial handshake to establish security. This certificate includes the public key information along with other identifiers and, in nearly all circumstances, a signature from a certificate authority (CA) to allow additional validation. (There are problems with certificate authorities' authoritativeness, too, but that's a different issue.)

These certificates from public Web sites may be freely retrieved. This is happening ever more frequently because the growing lack of trust in the security measures employed by CAs to keep SSL/TLS certificates from being generated for illegitimate parties has led to various projects scanning all public Web sites on a continuous basis to determine and alert users and site operators if the certificate for a given domain changes over time. This led the Electronic Frontier Foundation (EFF), for one, to pull down most recently 7.1 million certificates as part of its SSL Observatory. Another research group harvested millions of public certificates through their own efforts in a relatively fast and painless operation.

Once you have a large database of certificates, you can extract the public keys and look at the products. Researchers from the École Polytechnique Fédérale de Lausanne (Switzerland) and one independent collaborator released an academic paper on February 14th that examined both public keys in SSL/TLS certificates and PGP-style public keys used typically for person-to-person communications. The PGP keys revealed almost no problems; RSA keys in Web certificates, on the other hand, were very troublesome. The authors found that about 0.2% (2 in a 1,000) of the dominant 1024-bit RSA keys they checked in a first set and closer to 0.4% of a larger set with more recent data were vulnerable because two or more certificates shared a prime factor.

It's not impossible that the RSA key generation software would have "collisions," in which the same primes were generated on multiple occasions by different systems. But it should be far more unlikely. Thus, there is some flaw that prevents the degree of randomness necessary to ensure the least possible repetition. In private use, this wouldn't matter. But because public-key cryptography relies on publishing keys, such overlap may be found easily. (CAs don't typically generate the public/private key pair. System administrators generate an RSA key using software like openssl on a computer under the admin's control. That key is then bound up in a certificate signing request that's submitted to a CA to get its imprimatur.)

I'm not a mathematician, so I can't translate the specifics of how sets of different products sharing a single prime factor can result in a relatively easy extraction of the other prime. Once you have both primes, the calculation necessary to create the private key exponent may performed without any fuss. That eliminates all obfuscation from transactions conducted using that RSA key.

SSL/TLS and other methods of establishing secure network connections use public-key cryptography for the initial part of the setup to exchange a short symmetrical key used by both parties that's much faster to compute, and is used only for a session or part of a session. The public-key portion ensures that the session key isn't sniffed. But if the private key is known, the initial exchange of the session key may be decrypted through passive data sniffing, and thus the session key revealed. In some implementations, as EFF notes in a post on this research, learning the private key could allow decryption of previous encrypted communications if someone attempting to intercept such traffic retained records for later cracking.

The Lausanne researchers worried about disclosure, as their work could be easily replicated. In the end, they notified as many certificate owners with extractable primes as they could, based on information in the certificates and at Web sites at which the certifications are used. They found that to be quite spotty in practice. EFF is also engaged in notifying vulnerable parties.

The fallout

This is all troubling, but what's the upshot for practical threats?

First, the authors of the Lausanne paper found that 2048-bit RSA keys aren't invulnerable, but a literal handful (10 keys out of 3.2 million in the latest dataset examined) could be factored. That compares to about 20,000 out of 3.7 million 1024-bit keys. EFF's recent scan also shows that Web sites are migrating to 2048 bits, with a shift of nearly a quarter of all certificates moving from 1024 to 2048 bits from its August 2010 to its 2012 scan.

Second, for a broken certificate to be used to sniff data, an observer has to be on the wire, in a position between clients and the server. If you can break into a server to monitor traffic, you don't need to subvert SSL/TLS. You can grab the decrypted data on the server as it arrives. Being able to sniff at a major data center or Internet exchange where SSL/TLS is assumed to cover any risks would possibly be easier to arrange, but not trivial. Being able to be a man in the middle at the right point in the topology is largely relegated to corporate spies and government agencies.

Third, a separate group of researchers outed themselves on February 15th, as they had been engaged in a similar set of research. In a post by Nadia Heninger at Princeton's Freedom to Tinker blog, she explained we shouldn't get our IP panties in a wad over this. Her group, which has a forthcoming paper it will release after finishing disclosures to affected hardware manufacturers, found that the primary position for these weak public keys was in embedded devices, such as consumer routers and corporate firewalls. She suggests that embedded device makers have the most work to do.

This is the dilemma with all security and encryption issues. This is a problem—a serious problem! And it's not theoretical, but the practical impact appears to be small, and it's fixable in a reasonable amount of time without an insane amount of effort. But there's no way to know which popular sites used weak RSA keys unless someone replicates this work and publishes it openly. Nor can we know whether any data was extracted from sessions conducted with them. The odds seem to favor few sites and no data, but it's simply impossible to know.

You can ignore any hype about this problem. RSA, public-key cryptography, and SSL/TLS certificates aren't broken or even damaged. We just need a little more random acts of key generation (and an option when making keys to check them against a public database of identical prime products), and the problem will disappear.