Jarno, at F-Secure — an excellent Finnish anti-malware company — has posted a nice analysis of encoding password files. Because he assumes some knowledge of the way things work, I’ll try to expand a bit on that here. Some of this has been in these pages before, so this is a review.
A cryptographic hash algorithm is a mathematical algorithm that will take some piece of data as input, and will generate as output a piece of data — a number — of a fixed size. The output is called a hash value, or simply a hash (and it’s sometimes also called a digest). The algorithm has the following properties:
- It’s computationally simple to run the algorithm on any input.
- Given two different inputs, however similar, it’s very likely that the hashes will be different (it is collision resistant).
- Given a hash value, it’s computationally infeasible to determine an input that will generate that hash (it is preimage resistant).
- Given an input, it’s computationally infeasible to choose another input that gives the same hash (it has second preimage resistance).
Cryptographic hash algorithms go by names like MD5 (for
Message Digest) and SHA-1 (for
Secure Hash Algorithm), and they’re used for many things. Sometimes they’re used to convert a large piece of data into a small value, in order to detect modifications to the data. They’re used that way in digital signatures. But sometimes they’re just used to hide the original data (which might actually be smaller than the hash value).
Unix systems used to store user names and passwords in a file called
/etc/passwd, with the passwords hashed to hide (obfuscate) them. A standard attack was to find a way to get a copy of a system’s
/etc/passwd file, and try to guess the passwords offline. If you know what hash algorithm they’re using, that’s easy: guess a password, hash it, then look in the
/etc/passwd file to see if any user has that hash value for its password.
Nowadays, most systems have moved away from storing the passwords that way, but there are still services that do it, there are still ways of snatching password files, and the attack’s still current. Jarno’s article looks at some defenses.
Salting the hashed passwords involves including some other data along with the password when the hash is computed, to make sure that two different users who use the same password will have different hashes in the password file. That prevents the sort of global attack that says,
Let’s hash the word ‘password’, and see if anyone’s using that. Of course, if the
salt is discoverable (it’s the user name, or something else that’s stored along with the user’s information), users’ passwords can still be attacked individually.
Even using individual attacks, it’s long been easy to crack a lot of passwords offline: we know that a good portion of people will use one of the 1000 or so most popular passwords (
123456, and so on), and it never has taken very long to test those. Even if that only nets the attacker 5% of the passwords in the database, that’s pretty good. But now that processors are getting faster, it’s feasible to test not only the 1000 most popular passwords, but tens or hundreds of thousands. All but the best passwords will fall to a brute-force offline attack.
The reason offline attacks are important is that most systems have online protections: if, as an attacker, you actually try to log in, you’ll only be allowed a few tries before the account is locked out and you have to move on to another. But if you can play with the password file offline, you have no limits.
Of course, the best defense is for a system administrator to make sure no one can get hold of the system’s or the service’s password file. That said, one should always assume that will fail, and someone will get the file. Jarno suggests the backup defense of using different salt values for each user and making a point of picking a slow hash algorithm. The reasoning is that it doesn’t make much difference if it takes a few hundred milliseconds for legitimate access — it doesn’t matter if a login takes an extra quarter or half second — but at a quarter of a second per attempt, it will be much harder for an attacker to crack a bunch of passwords on the system.
Just two small points:
First, Jarno recommends specific alternatives to SHA-1, but he doesn’t have it quite right. PBKDF2 and HMAC are not themselves hash algorithms. They are algorithms that make use of hash algorithms within them. You’d still be using SHA-1, but you’d be wrapping complexity around it to slow it down. That’s fine, but it’s not an alternative to SHA-1.
The same is the case for bcrypt, only, worse, bcrypt uses a non-standard hash algorithm within it. I would not recommend that, because the hash algorithm hasn’t been properly vetted by the security community. We don’t really know how its cryptographic properties compare with those of SHA-1.
Second, Jarno suggests that as processors get faster, the hashing can be changed to maintain the time required to do it. He’s right, but that still leaves an exposure: because the server doesn’t have the passwords (only the hashes of the passwords), no hash can be changed until the user logs in. If the system doesn’t lock out unused accounts periodically, those unused accounts become weak points for break-ins over time.
That said, this is sound advice for system administrators and designers. And perhaps at least a little interesting to some of the rest of you.