# Passwords
"Something you know" refers to authentication based on a human's knowledge of a
secret. The secret is usually a PIN (short numeric code), password
(short string), or passphrase (longer string); these differ primarily
just in length. So henceforth I write "password", but what I discuss is
relevant to all three.
Passwords have a life cycle:
1. **Create:** user chooses password
2. **Store:** system stores password with user identifier
3. **Use:** user supplies password to authenticate
4. **Change (Recover/Reset):** user forgets; wants or needs to change
Let's discuss these stages of the life cycle in reverse order.
### Stage 4: Change
Users forget passwords. Systems (perhaps unwisely) cause passwords to
expire. Users want to change passwords. Passwords sometimes get
disclosed to attackers. How might that happen? They could be found on
post-it notes, revealed by the user themself (social engineering),
guessed by the attacker (online guessing), or *cracked* by the attacker
who obtains the password database (offline guessing). So recovery and
change are important parts of the password lifecycle. Unfortunately,
these parts tend to receive less attention than the rest of the system
and consequently might be poorly designed and tested. That makes them
attractive targets for attacks, rather than going after the (presumably
well-engineered) primary authentication mechanism.
Recognize that recovery is the authentication problem all over
again: the system must authenticate user by some other means than a
password. Standard solutions on the web today exemplify that:
- **Answers to security questions:** this is "something you know", using
attributes of your identity established during enrollment. It trusts
that other humans can't masquerade as you—which is likely not true!
- **Sending an email with a link:** this is "something you know",
using a (one-time) password that expires immediately after use. It
trusts that your email account correctly authenticates you.
### Stage 3: Use
**Prompting for passwords.** Implicitly, there's always some kind of
*mutual* authentication going on. Before the human enters a password,
she has to decide whether to enter it. (Many people don't think
about this very carefully, contributing to the success of phishing
attacks.) Some systems go to greater lengths to achieve mutual
authentication:
- **Windows** has a *secure attention sequence* that provides a *trusted
path* to the OS: only Windows itself can receive the Ctrl+Alt+Del
key sequence. That makes it difficult for malicious code to pretend
it's the OS and spoof the login prompt.
- **Banks** sometimes use a *visual secret* technology. The user enters a
username, the bank displays that username's chosen secret, then the
user enters a password. The idea is to prevent phishing by causing
users to abort if the picture isn't right.
Visual secrets are trying to solve a UI problem in the
browser—enabling the user to easily identify the remote server.
Users are quite bad at detecting counterfeits. *Assumption: Users can't
discern whether they're really interacting with their bank or
not—for example, by looking at the browser's title bar.* Consider
a man-in-the-middle (MitM) attack, in which the attacker interposes
between the human and bank. The attacker learns the human's username,
forwards it on the bank, receives back the user's visual secret, and
displays it to the user. The user then enters her password for the
attacker. GAME OVER. What would prevent this attack? Only the human
noticing that website isn't really the bank's. But that contradicts our
initial assumption. So visual secrets don't actually, fully prevent
phishing attacks. Visual secrets do raise the bar, though, by making it
(a little) harder for attackers to mount phishing attacks.
**What should happen when password-based authentication fails?** I.e.,
how should an application react in response to successful or failed
authentication? It could be useful to tell successfully authenticated
users when they last logged in, the number of failed attempts since
then, and maybe even when those attempts were. When authentication
fails, the identity might be under online attack. Possible responses
include rate limiting future attempts to authenticate under that
identity, and eventually disabling the identity (though that creates an
availability attack). Informative error messages tend to hurt security
here.
A related question is: should an application prompt for a username
first, or at same time as the password? Prompting first creates a new
vulnerability, in that it enables attackers to guess valid usernames. So
it's better to prompt at the same time, and not give any indication of
which was wrong, the username or the password, if authentication fails.
(Note how visual secrets might disobey this rule.)
### Stage 2: Store
After creating a password, both the human and system must store it.
**Humans** don't have the storage capacity to remember lots of strong
passwords. So they naturally do a mixture of
- *reusing* passwords across various identities (with the consequence
that an attacker who learns password from one site can impersonate
human on others), and
- *recording* passwords in written or electronic form (with the
consequence that an attacker can search for that record rather than
brute-forcing the password).
The classical example of bad recording is writing down passwords on
post-it notes next to monitors. But there are some better ways of
managing recording. Humans are pretty good at securing pieces of paper,
called *cash*, in their wallet. So recording passwords in your wallet
might help. (Though loss of your wallet now has even greater impact than
before.) Various electronic password wallet applications are also
available, and can be very convenient to use—even making it easy to use
random passwords, one per identity. But those apps (and their data, if
any) now become targets.
**Systems** store passwords in a file (or a database). If that file is
sufficiently secured with appropriate access control mechanisms, then
storing plaintext passwords could in principle be okay. But real-world
experience teaches us that password files get leaked. So for Defense in
Depth, don't store plaintext passwords!
How can we protect passwords such that, even if the password file is
disclosed, attackers don't learn users' passwords? We want a function f
such that, if we store f(pass) for a password pass, then it's hard for
attackers to invert f(pass) and find pass. When we studied cryptography,
we called that property *one way*, and we saw that cryptographic
hash functions provide that property. (Encryption does too, but
encryption requires a key, whereas hashes do not.)
**Hashed Passwords**: For each user with username `uid` and password `pass`,
store `uid, H(pass)`. To authenticate a human Hu to a local machine L:
```
1. Hu -> L: uid, pass
2. L: if H(pass) is stored as uid's password, then uid is authenticated
```
During authentication, the Hu is interacting with the local machine directly
(e.g., by a keyboard), not over a network.
With this scheme, if the attacker obtains the password file, he doesn't
immediately learn the plaintext passwords. But it's not a perfect
solution, because, besides inverting the hash function, there's another
way to learn passwords. An attacker can construct a *dictionary* mapping
plaintext passwords from their hashes:
```
H(pass1), pass1
H(pass2), pass2
H(pass3), pass3
...
```
Given such a dictionary, and given a hashed-password file, the attacker
can simply look up the hashed passwords and find the plaintext
passwords. So the question becomes, how hard is it to construct a
dictionary? It turns out it's not all that hard. It just takes time and
space. Special data structures called *rainbow tables* can greatly
reduce the space requirements. It's even possible to buy or download
free, precomputed tables [].
Part of the reason why it's not hard to construct a dictionary is that
the set of passwords likely to be chosen by a human is much smaller than
the set of all possible passwords. According to [Bruce
Schneier](https://www.schneier.com/essay-148.html) (2007), quoting Eric
Thompson of AccessData,
1. The typical password comprises a root followed by an appendage.
2. The root is typically pronounceable, though not necessarily a
dictionary word
3. The appendage is usually a suffix (90%) or prefix (10%).
4. Most users have a 7–9 character root plus a shorter common
appendage.
5. A dictionary of 1,000 roots plus 100 suffixes (= 100k passwords)
cracks about 24% of all passwords.
6. More sophisticated dictionaries (including initial/final uppercase,
and [leet speak](http://en.wikipedia.org/wiki/Leet) substitutions)
crack about 60% of passwords within 2–4 weeks.
7. Given biographical data (zip code, names, etc.) and other passwords
of a user, the success rate goes up a little, and the time goes down
to days or hours.
What countermeasures can defend against *dictionary attacks*?
**Idea 1: Slow down.** One reason that it's easy to construct
dictionaries is that cryptographic hash functions are designed to be
quick to compute. A hash function that was slow to compute would cause
table construction to take more time, making it more difficult for
attackers to succeed. There's a function called "scrypt" that is
designed in just this way. Another means to slow down computation is to
iterate the hash function, perhaps a thousand times:
```
z1 = H(pass);
z2 = H(pass, z1);
...
z1000 = H(pass, z999);
output z1 XOR z2 XOR ... XOR z1000
```
(This algorithm is almost the same as PBKDF2 from [RFC 2898](http://www.ietf.org/rfc/rfc2898.txt).)
Another term for this technique is *key stretching*. The number of times
to iterate can be a parameter. Over the lifetime of a system, the
parameter can be adjusted upwards to account for increasing attacker
computational power. When PKBDF2 was proposed in 2000, a minimum of
1,000 iterations was suggested. As of 2013, 10,000 is a minimum, and
20,000–40,000 is not unreasonable. Arguably, more iterations
that that [would not be as useful][agilebits] as requiring a single
extra randomly chosen character in the password.
[agilebits]: http://blog.agilebits.com/2013/04/16/1password-hashcat-strong-master-passwords
**Idea 2: Add salt.** Dictionary attacks succeed because the password
space is so small. If we increased the size of that space we could make
the table size infeasibly big to compute or to store. The trick is to do
so without making users choose longer passwords. We can do that by
introducing a nonce, which is commonly called *salt*. Every user is
assigned their own unique salt. For each user with username `uid` and password `pass`,
store `uid, salt, H(pass, salt)`. To authenticate a human Hu to a local machine L:
```
1. Hu -> L: uid, pass
2. L: lookup salt for uid;
if H(pass, salt) is stored as uid's password, then uid is authenticated
```
Note that salt doesn't need to be unpredictable, just unique. But to
prevent password cracks on one system from being effective on another
system, salt does need to be unique across systems. Generating the salt
randomly is the easiest way to achieve that.
To combine salt with iterated hashing, just salt the first hash:
```
z1 = H(pass, salt);
z2 = H(pass, z1);
...
```
PBKDF2 [[RFC 2898](https://tools.ietf.org/html/rfc2898)] is an algorithm
that, as commonly used, does salted iterated hashing. It can be used to
prepare passwords for storage, or to derive a cryptographic key from a
password. WiFi WPA2 uses PBKDF2 to derive long-term keys from
passphrases; the network's SSID is used as the salt.
**Time vs. space.**
As custom hardware (ASICs, GPUs) make it possible to quickly compute many
hashes in parallel, time is no longer necessary the bottleneck. So hash
designers are turning to **space** as another resource that can be used
to slow down hashes. Recent algorithms such as scrypt and Argon2 use this idea.