Nuts-and-bolts look at password cracking

Ars Technica's Nate Anderson decided to try cracking passwords (from a leaked file of MD5 hashes), to see how difficult it was. After a very long false start (he forgot to decompress the word-list file) that's covered in a little too much detail, Anderson settles down to cracking hashes in earnest, and provides some good data on the nuts and bolts of password security:

By this point I had puzzled out how Hashcat worked, so I dumped the GUI and switched back to the command-line version running on my much faster MacBook Air. My goal was to figure out how many hashes I could crack in, say, under 30 minutes, as well as which attacks were most efficient. I began again on my 17,000-hash file, this time having Hashcat remove each hash from the file once it was cracked. This way I knew exactly how many hashes each attack solved.

This set of attacks brought the number of uncracked MD5 hashes down from 17,000 to 8,790, but clearly the best "bang for the buck" came from running the RockYou list with the best64.rule iterations. In just 90 seconds, this attack would uncover 45 percent of the hashed passwords; additional attacks did little more, even those that took 16 minutes to run.

Cracking a significant number of the remaining passwords would take some much more serious effort. Applying the complex d3ad0ne.rule file to the massive RockYou dictionary, for instance, would require more than two hours of fan-spinning number-crunching. And brute force attacks using 6-character passwords only picked up a few additional results.

The point, really, is that if you want to understand the relative security of different password-generation techniques, you need to understand what's involved in state-of-the-art password cracking techniques.

How I became a password cracker


  1. The point that I felt was rather weakly made in this article is these results are with plain, unsalted, hash cracking. This is well-known to be woefully insecure. Anyone who doesn’t salt their passwords doesn’t deserve to manage the information and none of the clients should give them secure passwords either.

    Within this context, the article is interesting, but I would hope generally inapplicable to the real-world.

    1.  OKAY… admitting complete ignorance here, don’t judge me too harshly.  What is meant by salting one’s passwords? 

      1. “Salting” a password is when an administrator (not the end-user) adds a bunch of characters to a password before encrypting it.

        e.g., If your password is “swordfish”, the administrator adds “je8u2t5” and then encrypts “swordfishje8u2t5”.

        This makes it significantly harder to crack while requiring no more effort on the part of the user, and a trivial amount of extra work for the administrator.

      2. My password is “sheep7”.  The md5 of “sheep7” is 417f44f719459efcf8a6854d77a4320c.  An unsalted password scheme stores just that.  When someone claims to be “Stephan Zielinski” and offers a password as proof, the system computes the md5 of what they offered; if it’s 417f44f719459efcf8a6854d77a4320c, it’s a match.

        Unfortunately, do it this way, and it’s possible for a bad guy to build a dictionary of md5s of common words and common variations on common words.  If he can get his hands on the md5s of peoples’ passwords, he can try looking them up in the dictionary; any time there’s a match, he’s effectively got their password.

        One way to improve this is to md5 the password plus a little more.  For instance, compute not the md5 of the password, but of the phrase “My very fine password is {whatever}”.  Example: the md5 of “My very fine password is sheep7” is b43b43137f5ace7a8b00e40232c82f38.  Under this scheme, someone claiming to be me offers a password, the system prepends “My very fine password is ” to it, and computes the md5 of the combination; if it turns out to be b43b43137f5ace7a8b00e40232c82f38, it’s me.

        This is better– but a bad guy can, if he’s able to learn what the phrase is, build a new dictionary as above.

        So better still is to use “salt”.  Under this scheme, when I set my password, the machine goes and gets a bunch of random bits, calls that “salt,” and computes the md5 of the password plus the salt.  Then it stores away the salt and the md5.  For example, say when I set my password, the random number generator returns 0x73e399f9.  The machine computes the md5 of “sheep70x73e399f9”, which is 0631832fb724d48cef5e867a4d2a44b7, and stores that and the salt.  When someone claims to be me and offers a password, the machine looks up the salt it has on record for me (0x73e399f9), combines that with offered password, and sees if the md5 of the combination matches 0631832fb724d48cef5e867a4d2a44b7.  

        Under this scheme, a bad guy could try to build a dictionary, but he’d have to build one for every possible value of the salt.

        Obviously, this is not rocket science.  Using salted rather than plain passwords is so easy to do, there really isn’t any good excuse for doing it the weak ineffective way.

        1. Thank you for this excellent explanation!

          Two follow up questions:

          If I understand you right a malicious person who gains access to both the md5 hashes and the salt can easily produce a new dictionary. So to add security in practice it seems that the salt must also be harder to gain access to compared to the password hashes. How is that achieved in practice?

          Furthermore, if the malicious person had a user account at the site they later gain access to the password hashes from, couldn’t s/he get the salt through bruteforce by using their username, password and their salted password hash from the captured hashes? (I’m here guessuming that usernames and salted hashes ar stored as pairs in a database.)

          1. There is not a salt.  Each user name has its own salt– determined at password-change-time by some reasonably random process.  (And new password, new salt.) 

            With a large enough number of possible salts, no two users are likely to have the same one.  (32 bits is enough for ~4 billion possibilities; 48 bits yields ~2.8×10^14).  This means a bad guy essentially has to start over for every single specific user name he wants to try to crack.  (Even assuming he’s got the salt, too– any data leak that results in the escape of user names and encrypted passwords will also probably be a leak of the salts as well.)

    2. The issue is that you don’t get to select how people you give your passwords to deal with them.  Yes, this is a dumb way to store passwords, but there are a shocking number of dumb websites out there.  When Sony, a big international with a reputation and security policies has passwords stored in exactly the same way that Ars had them, you can rest assure that lots of other people fail as well.  My cable company has my password stored in plain text and their “password recovery” is to read you back your password over the phone.  You can’t select how other people store your passwords.  You should basically assume that all your passwords are in plain text or in some assholes excel file and not trust others with your security.

      The real danger is that if you use your generic password on a site with shit security, someone grabs their password file and has your e-mail and a password that you use for every other site, give or take a few variations.  You are screwed at this point.

      The lesson here is to never use the same password twice.  Use a password locker of some flavor.  Yes, you are weak against a single point of attack.  If your computer is completely compromised, someone steals your password file, and has a key logger, you are somewhat compromised.  You were fucked anyways though, so you are no more screwed than you were before.  You however are not vulnerable due to someone else’s bad security.  Some dumb website can hand out your passwords in unencrypted text and it only means that one website is compromised, rather than having everything compromised.

      If you want to be safe, use a password keeper, make all passwords random, but a big ass password on that keeper that nothing short of a NSA computer with a lot of spare time is going to break, and use two-step encryption for anything that is important (namely, banking and your e-mail).  Do that, and even a persistent threat is going to have a hard time getting into anything important.  As a bonus, you don’t waste your time trying to remember a thousand variations on your one password.  You just need to know one big ass password to rule them all.

      1. I agree with most of this, particularly about using a password keeper, except for some minor points: I’m a lot more comfortable having my primary email password be something in my head and not randomly generated by the computer, distinct from my vault password. If I ever lose access to the vault I can still reset everything else with the email account. Both are equally strong. So, two big ass passwords to rule them all.

        And the “no duplication” rule can be relaxed for sites with no access to your email, financial, or medical data. If someone cracks my BoingBoing password they can also post comments as me on this other site here, whoop-de-shit.

        1. I completely agree.  Your e-mail, slathered with two step authentication, is a good one to have a memorized password for.  If your password locker ever gets corrupted, compromised, whatever, you can still always get into your e-mail and reset all of your passwords.

    3. You’re quite right that plain MD5 is extremely weak, the salting part isn’t really applicable in this case.

      Password salting is mainly a protection against in-advance hash list (aka rainbow table) computation.  Since the author didn’t pre-compute or download a rainbow table of MD5 hashes, password salts wouldn’t have made very much difference.

      In fact, an important point to get from what can be done nowadays with hashcat, particularly by someone willing to shell out a few hundred bucks for some fast graphics cards, is that rainbow table attacks are fast on their way to being obsolete, if they aren’t already.

      (EDIT – not that anyone should stop using password salts, please FSM no! What folks should be doing is continuing to use salts, and also moving to password hashing methods that are much slower, and in particular harder to run multiple ones in parallel on a GPU. bcrypt is probably a good choice)

      1. The salting part is definitely applicable in this case.

        The author didn’t compute a rainbow table because he fed in dictionaries. If my password “swordfish” is in the dictionary, the cracking program will get to the dictionary entry “swordfish,” compute the MD5 hash for it, notice that the hash was in the password list, and know that my password was swordfish.

        At the same time, it would find all the other people who’s password was also swordfish. That is, it would only have to check “swordfish” once against the entire list.

        However, if you salt a password even with something simple such as the user name, then you’ve made this problem and order or magnitude more difficult.

        My salt is “SamSam”, so my password is “swordfishSamSam.” To find my password, you have to create a new dictionary that adds “SamSam” to the end of every word. Now you run the dictionary attack against me, and after a few minutes it finds a match and has my password. 

        But it only has mine.

        Then it has to find your salt, create a new dictionary, and run the attack against you. Then the next user, and the next. You can no longer run a single attack against all million users, you have to run a separate attack on each user, making it a million times slower.

        1. You’re right – salting very nearly eliminates the opportunity to use precomputed hashes, and unique hashes do reduce the degree of parallelism the attacker can get.

          That’s still not enough when dealing with oclHashCat (the version that uses GPU for computations).  Slowing the attacker by a factor of a million just means they have time to make a fresh pot of coffee before they start to get cracked passwords, rather than just pouring a cup from the pot that’s already there…

  2. In related news, 99% of all password-based security provides arbitrary limits on both the length and the characters used in said passwords, making them significantly less secure than they otherwise could be.

    If we can type it, we should be able to use it as a password.

    1. You should just assume that all passwords you give out are going to be cracked.  The moral of the story is to never use the same password twice.  If you are not using a password locker and are reusing passwords, you are just a sucker waiting to happen.

      Don’t get me wrong, I am all for finger wagging at people that use bad security when storing your password, but it isn’t going to save you.  Random different passwords and two-step encryption on everything important is about as good as you are going to do.  It won’t save you from the NSA perhaps, but it will defend you from most other attacks.

  3. Just one thing: we are all tech-interested here, otherwise we would not have read this thread. Is there anything like a how-to-motivate-your-pals?

    In one of my former academic working environments, 15 out of 15 people used dropbox, but only 2 out of 15 used boxcryptor for sensitive files. And I am the only one who even tried out OpenPG, TrueCrypt and KeePass.

    What really worries me is that people who are working with R, Matlab, and some also with C++ on a daily basis do have no problem with storing plain-text passwords for a webserver on an unsecured NAS, and using the same password for all general working-environment purposes. How on earth can I explain to my mother she could use a password keeper if even my former colleagues don’t go for it?

    1. You may find it easier because people who should know better but don’t are often far more set in their ways.

      That and the more technically inclined may have set up hundreds of accounts, thereby choosing to use a common password, well before password keepers become practical to use. There is, therefore, inertia due to the effort of adding the accounts to a password keeper and changing the password for each one.

      1. Good reasoning.

        Now, how to overcome this? For the world of it, I can’t figure out a good strategy how to change this behaviour in my peers, let alone in my family.

        Maybe, we just need some people working in advertisement to change that. Maybe someone can get access to some celebrities who have been hacked to be featured in a TV commercial…
        Ah, the irony of it.

  4. I’m on the edges of security.  Let me ask – are systems that (1) force you to wait 10 seconds before entering another password, and (2) lock you out after 3 false entries – are they more secure?

    1. Typically, no. These are steps taken by people who haven’t actually thought through their password security. 

      Delays between password entries aren’t actively bad, but they are annoying to end users and in the long run don’t really make much difference in the time it would take to brute-force a password from the outside – 100ms versus 10 seconds when you’re talking millions or billions of attempts just doesn’t matter. Such attacks are also “noisy” and high-profile, which means they are short-lived, making them even less useful.

      Lockouts, on the other hand, ARE actively bad – they make the site or software vulnerable to denial of service attacks, especially if the usernames are predictable (think about how many companies use as their email addresses, and how many of those addresses are publicly available).

      Both of these defenses are designed to deal with passwords that are extremely weak, like 4-digit PINs or four character alpha passwords. A much better way to deal with that problem is simply to require longer, more complex passwords – as the chart in the article shows, 9 or 10 characters is more than enough to resist a hash attack, let alone a brute force attempt, and makes it far less likely that the password will match one of the lists like RockYou. People can easily remember 9-10 character passwords, as phone numbers are 10 digits and social security numbers are 9 digits.

      1. I have to disagree with you completely on the delay argument, but then agree with the conclusion for another reason.

        Delays between password attempts absolutely affect whether or not you can brute-force a password. The entire reason why the author was able to crack these passwords, and write the article, is that computers have gotten fast enough that they can calculate thousands of MD5 hashes per second. 

        If you can try a million passwords a second, and there are a billion likely possibilities (not an unreasonable number), then it will take you 15 minutes to crack your way in.

        Next year, when your same laptop can calculate ten million hashes per second, the same billion attempts will take just 1.5 minutes.

        But if the system forced you to wait a second between each attempt? Now it doesn’t matter how fast your computer becomes, it will always take you a billion seconds, or 31 years, to try all billion possibilities.

        That is the point. There is a very big difference between 1.5 minutes and 31 years, and that 31 years will stay 31 years no matter how fast the computers get.

        However, in practice this is all irrelevant because that isn’t how password cracking works. No one tries to feed gmail a million passwords a second, or even one password a second, because gmail would notice and shut down the requests. Instead they do what this article talks about: find a dumped hashed password list and run their millions of calculations on their own computers.

Comments are closed.