Storing Passwords - The Wrong, Better and Even Better Way

If you've ever had to sign up to use a website, you'll no doubt have been prompted to provide a username and password, so that when you next visit the site you can login without having to fill in all of your details again. Your password has to be stored somewhere, otherwise you won't be able to login the next time you visit. Right? Unfortunately, a few sites I've come across do just this. They store your password, which means if the information is stolen, someone has got your password.

In a perfect world, everyone would use a different password for every account they sign up for, and that password would be a combination of numbers, letters (uppercase and lowercase) and special characters, and would be at least about 20 characters long. But let's be honest here, we're not all memory machines and remembering cryptic combinations like that isn't something everyone can do. So people are tempted to choose just one password and use it on everything they sign up for, including their email account. Which means there's the potential for someone to have the email address and the password. Not a very good thing.

There's no way around this, the weakest point in any security system is the human element. People are always going to chose easy passwords, or use the same passwords for multiple sites. No amount of security can make up for users picking a bad password to begin with, but we can still protect those use the same password for everything. So it's up to us as web developers to help to keep their passwords secure so that these people never have to go through the problems associated with someone getting into their other accounts.

This all comes down to how you store the password. Do you do it the "Wrong Way", the "Better Way" or the "Even Better Way"? (I'm not going to say the "Right Way", because I don't think there is such a thing when it comes to password security).

The Wrong Way


I was shocked to recently discover a website which stored their passwords in plain text. I couldn't quite grasp how in modern web development anyone could store their passwords like that, but it still happens. Storing passwords as plain text is a very bad thing. It means that if I sign up for a website using the password of "iamawesome", then the value that's stored in their database is also "iamawesome". This means any employees with access to the database (and there will always be at least one), can see your password. They can also probably see your email address associated to the account. So if you're one of those people that uses the same password, there's nothing to stop that person going and getting into your accounts. I like to think the majority of people out there in this position are honest and would never use that information for such nefarious means, but by the same token, I'm willing to bet there are people out there who aren't so honest.

So how can you tell if a website stores your passwords securely? The simple answer is that you can't. But there are a few tell tale signs that they're not doing all they can. Many websites have a "Forgotten password" function. If you go through this procedure and you get an email with your original password in, then it is highly likely they're storing you password in plain text. (Although this isn't always the case, they could be using reversible encryption, but again.. the developers will now how to reverse the encryption, so while it may stop anyone who steals the database, it wouldn't stop dishonest employees).

Personally, I would stop using such a service straight away (and in most cases I will email the relevant department to warn them that they're storing things insecurely). There is no need for a website to store that information in plain text. None at all. It puts your information at risk. There's a much better way to store passwords.

The Better Way


In order to login to a website, there's no need for the site to actually know what that user's password is, they just need to be able to tell if you've entered the same password as when you registered.

You can use something called a "hash" to do this. A hash is a one-way mathematical function, which given an input A, will always produce the same output B. But ideally it's very difficult to get from B back to A (but not impossible, you could use a rainbow table to lookup A based on the hash B).

Note : It is possible for more than one input value to result in the same hash. This is called hash collision. It's pretty unlikely to get hash collision unless you're using very large datasets, but it is a possibility. I've never come across it myself, although it does happen. The danger with hash collision is that someone doesn't need to know the original password, if they can find something which hashes to the same result, then they can just use that.

Different algorithms will hash things in different ways, and some are better than others. MD5 is a common one used in lots of tutorials but it has serious issues with collisions and is universally considered to be cryptographically broken. SHA1 also has similiar issues with hash collision. The bottom line is you shouldn't use MD5 or SHA1. Others, such as the SHA-2 family of algorithms, currently have no known collision issues. I'm going to use SHA-256 in these examples, but you could use any other number of hashing algorithms. There is still a problem with using these algorithms for passwords (they're too fast), but I'll get to that later.

PHP 5 has the built in hash function, which I will use to hash with SHA-256,
$password_hash = hash("sha256", "iamawesome");
// 4aa4029d0d0265c566c934de4f5e0a36496c59c54b6df8a72d9c52bdf0c1a0e8


The idea behind using a hash, is that you store this hash in your database, rather than the plain text password. In order to verify a user has entered the correct password when logging in, you use the same hash function on whatever text they enter, then compare the result to what you have stored in the database. If they match, then the correct password was entered.
$user_entered = hash("sha256", $_POST['password']);
return ($user_entered == $password_from_db);


This way, only the user ever knows the real password. If someone were to look at the database of your stored hashes (whether it's a dishonest employee, or because it was stolen) they'll only ever be able to see the hash, and won't be able to go around getting into people's email accounts.

So you might be thinking that it's problem solved, let's just do this hashing stuff and we'll be secure. Well no. There are still some problems with this method. Suppose two people have the same password, this means they will have the same hash in the database. Now suppose you manage to trick one of these people into giving you their real password (via however, email scam, etc). This means you would now have the password for anyone with the same hash.

One of the most basic password attacks is called a dictionary attack. At the most basic level this would involve trying every word in the English dictionary as the password to login as someone, but more commonly the dictionary (a list of strings, not a language dictionary) contains a list of every combination of characters up to a certain length. This will generally work since it's quite common for people to just use a dictionary word or a common sequence of characters as their password. A dictionary attack is still possible when using hashes. You can generate a list of hashes from a dictionary and compare it to the hashes in the database, you'll probably find a few matches, and then you have the password for those people. The list of hashes is called a rainbow table, basically a massive lookup table of hashes and the corresponding input.

So really all we've done is obfuscated the password from people viewing it directly, but certain attacks are still possible. There is still a better way.

The Even Better Way


The way around these problems is to use something called a "salted hash". The definition of a salt is "random bits that are used as an input to a key derivation function", basically just another word for a nonce. Normally to create your hash you provide one thing as input (the original password) and you get the hash as an output. A salt/nonce is a random string of characters you use as another input into the hash function in order to get the output.

So now when storing a users password, instead of just hashing the password, you concatenate the password and the salt/nonce and hash that instead. So in PHP it would look something like this,
$salted_hash = hash($password . $random_salt);


The salt should be a string of random characters, ideally it should be long (more than 20 characters) and not just alphanumeric, it should have special characters too.

You can make your function however you want, as long as the random salt and password are both used in order to construct the hash you want to store. The method you use doesn't change the effectiveness of your password storage, one is not really any more secure than the other. Relying on the design of how you hash the password and salt together to provide security is called security through obscurity and should be avoided, since in reality the method you use doesn't affect the security.
$salted_hash = hash(hash($password) . $random_salt);
$salted_hash = hash(hash($password) . hash($random_salt));
// ...etc


In the database, it's also important to store the salt for each user as well as the completed salted hash, otherwise you won't be able to tell if the user has entered the correct password. Each user should have their own random salt, you shouldn't use the same one for the entire database or for multiple users, otherwise you've completely negated the point of using a salt in the first place. You should however also use a site-wide salt (stored on the filesystem) in addition to a per-user salt (this is sometimes called a pepper), the idea being if your database is compromised then an attacker will only have the user salt, not the application-wide one.
$salted_hash = hash($random_salt . $sitewide_salt . $password);


You're database table should look something like this,
username         password_hash          password_salt
rich             2bae773debd80de...     ?hb-:4a-loDC90^n#=R...
bob              d82ff2c12d5065f...     2g}iT'JG><,?wP6{#VG...


So now, this means the hash you store will be different for every user in your database, even if they have the same password. So if anyone wants to do a dictionary attack by precomputing a rainbow table, they have to precompute it for each user individually, rather than the entire database at once, making it much more difficult and often infeasible.

(Note: You can never make it impossible to crack someone's password, there will always be a way. You can just make it very very difficult)

But now you have another problem, how do you implement the "I've forgotten my password" functionality if you can't tell the user their password? Rather then just retrieving the user's original password (which you can't do when using salted hashes), you instead want to verify the user by getting them to confirm some other information on their account, then send them a email with a confirmation URL/number in it. When they click this URL (or enter the number) they should then be taken to a page where they can set a new password for their account. You can now feel safe that the person setting the new password knows information about the account, and has access to the email address. You should never email a new password to your users as then if their email account is compromised, so is their account. Most mainstream websites nowadays will do this, and this is usually a good sign that they're storing your password properly.

Slow It Down


But wait, there's still more to do! While in most algorithms you aim to make things as fast as possible, in password hashing algorithms you want the opposite. The traditional hashing algorithms like MD5, SHA-256, SHA-512, etc. all have a serious problem when it comes to password storage, they're too fast. Now that you have salted passwords stored, what if someone were to try and crack those passwords?

While a rainbow table is a precomputed table of all the hashes corresponding to a set of plaintext passwords, another method of cracking involves feeding every possible combination of characters into your hashing algorithm rather than precomputing the results. These are called incremental password crackers. Basically rather than using space to attack the passwords (a massive rainbow table lookup), they use time (try a dictionary on your hashing function until a result is found). With services like Amazon EC2, you can have massive amounts of computing power for very little cost able do all the hard work for you.

If your password hashing function is very fast, then the incremental method will work faster and the password can be cracked quickly. If it takes 0.00001 seconds for your hash function to return, someone can try 100,000 passwords a second until they find the password. If it takes 1 second for your hash function to spit out the result, it's not a big deal as far as someone logging into your application is concerned, but for cracking the password it's a very big deal since each attempt will now take 1 second to get a result, meaning it would take 100,000 times as long to find the password as it would using your original hash function.

So how do you slow it down? Either use a hashing algorithm specifically designed to be slow (like bcrypt), or use a standard hash function lots of times. This is called key strengthening (or sometimes key stretching), and is just the idea of running the hash function through thousands of iterations.

So now your password hashing method becomes this,
$iterations = 100000;
$salted_hash = hash($random_salt . $sitewide_salt . $password);
for ($i = 0; $i < $iterations; $i++)
{
    $salted_hash = hash($random_salt . $sidewide_salt . $salted_hash);
}


You should also store the number of iterations in your database somewhere, since you will want to update the number of iterations in future if you get faster hardware. In that case, not storing it would mean older users would be unable to login. Your database table should now look like this,
username      password_hash          password_salt           hash_iterations
rich          2bae773debd80de...     ?hb-:4a-loDC90^n#=R...  100000
bob           d82ff2c12d5065f...     2g}iT'JG><,?wP6{#VG...  150000


Some people like to store these all in one field, with some pre-defined separator. That's fine too, it doesn't really make much difference other than how you write the code to extract those values. Just make sure the separator you use can't appear in the hash or salt, or you'll run into issues when extracting the values. Remember not to store the sitewide salt in the database, it should be stored on the filesystem with your application.
username      password
rich          $100000$2bae773debd80de...$?hb-:4a-loDC90^n#=R...$
bob           $150000$d82ff2c12d5065f...$2g}iT'JG><,?wP6{#VG...$


You can also take this a step further to future-proof it by including an identifier for the algorithm you're using. If you're using sha256 to hash passwords right now, in a few years you may want to use something better (there could be a sha1024 for example), if you include something that tells you with algorithm was used to hash, then you'll be able to use newer or more appropriate algorithms without having to change your code other than to add the new algorithm (you'll still be able to check passwords that were hashed using the older method, etc). Supposing sha256 had an identifier of 5 (which it does in crypt), then your database would now look like this,
username      password
rich          $5$100000$2bae773debd80de...$?hb-:4a-loDC90^n#=R...$
bob           $5$150000$d82ff2c12d5065f...$2g}iT'JG><,?wP6{#VG...$


Hashed, salted, strengthened and future-proofed. If a better hashing algorithm becomes available, you can implement it without affecting current users. Likewise, if you get new hardware, you can increase the number of iterations without affecting current users. If you have dishonest employees, they won't be able to get the passwords since they're hashed. If someone were to steal your database, they'd either need to compute a rainbow table for every single user in the database, taking a very long time, or take 1 second per password to try a dictionary in an incremental password cracker, also taking a very long time. Either way it's going to be difficult to crack. But not impossible, it never is!

The Even Better Than The Even Better Way


Security is hard, and I'm not a security expert. Everything I've written about above is just how I understand it right now, there could be important things I've missed, or even worse there could be things I've misunderstood and presented incorrectly (and please do let me know if that's the case). Unless you understand the intricacies involved in cryptographic algorithms and cryptography in general you shouldn't try to implement a security system yourself. There are plenty of great libraries out there built by real security experts, which have been tried and tested in the real world. A simple mistake somewhere in a custom built system can go unnoticed until it's too late.

Nothing I've written about here is new, in fact most of it dates back to 1976. If any of this page is news to you, then it just proves my point that you shouldn't custom build a security system.

I'll say this once more, because it's very important, use a well established library that has been tried and tested in the field, and written by real security experts. There are plenty of options out there, bcrypt and scrypt being popular choices. PHP has the built-in crypt function, and there are many frameworks such as phpass which handle everything for you. All of these incorporate everything I've talked about above.

But if you like to ignore good advice and decide to implement a password storage system yourself (seriously, don't do it!), at the very least remember to store your passwords the even better way by using salted, strengthened hashes!

Additional Reading

Other articles/posts on similar subject matter (some of these may be more recent than this one),

References

A list of all the links that appear in this note,