# Passwords, part 2
### Stage 1: Create
Who creates the password?
- **The user:** This is the normal case. Unfortunately, users tend to
create weak passwords.
- **The system:** A strong password is generated by the system and
assigned to the user. Unfortunately, these passwords are rarely
memorable, making them inconvenient to use. Users might resort to
writing them down in places near their machines.
- **The administrator:** Administrators must either generate the
password or select in manually. Either way, we're back in one of the
cases above.
Here are the [top five examples of weak
passwords](http://splashdata.com/press/pr121023.htm) chosen by users in
2012:
1. password
2. 123456
3. 12345678
4. abc123
5. qwerty
Those are consistent with older password hacks. For example, in 2010,
Gawker Media (parent of big blog sites), was hacked. Of [250,000
disclosed
passwords](http://blogs.wsj.com/digits/2010/12/13/the-top-50-gawker-media-passwords/),
about 1% were "123456" and another 1% were "password".
All this raises the question: how can we characterize "strong"
passwords? They need to be passwords that are hard for attackers to
guess. It turns out we already have such a characterization from our
study of cryptography. Recall that the *security level* of an
algorithm is the exponent of the maximum number of guesses required to
break an algorithm by brute force attack. When we talked about
encryption schemes, the guesses were to find the key, and we implicitly
assumed that keys were chosen uniformly at random from the space of all
keys. For example, a 128-bit key is from a space that requires 2^128
guesses to search exhaustively.
**Using entropy to measure password strength.** We can use the idea of
the number of guesses required for brute force search for passwords. But
passwords aren't bit strings; they're character strings. That makes the
math a little more complicated. Suppose there are N characters to choose
from, and the password is of length L. Then there are N^L possible
passwords. We want to find the security level H of that space. That is,
we want an H such that 2^H is equal to the number of possible
passwords. (Why use the letter H? Because the concept we're describing
is known in the field of Information Theory as *Shannon entropy*, for
which the letter H is traditionally used. And from now on, we'll write
"entropy" instead of "security level" when we're talking about
passwords.) Let's solve for that H:
N^L = 2^H
log N^L = log 2^H
L log N = H log 2
H = L (log N / log 2)
H = L log_2 N
So if passwords are chosen uniformly at random from the lower-case latin
alphabet of 26 characters, the entropy of an 8 character password is 8
lg 26 ≈ 37.6 bits. That's very low compared to the minimum security
level for keys! Is it enough? According to a [2006 NIST
report,](http://csrc.nist.gov/publications/nistpubs/800-63/SP800-63V1_0_2.pdf)
the minimum level is 14 bits, and 30 is comfortable. But that material
assumes an *online* attack model, in which attackers interactively guess
passwords. In an *offline* attack, in which attackers have direct access
to the password database, a higher level of security is necessary.
The last paragraph began by assuming that passwords are chosen uniformly
at random from the space of all passwords—for example, the password is
just as likely to be "iZ8\#j" as "12345". But humans just don't chose
randomly. So the entropy of human-chosen passwords is effectively much
less than it would be if the passwords were chosen by a machine.
Suppose, e.g., that the average high-school graduate has a vocabulary of
around 50,000 words [Nagy and Anderson; Pinker "The Language Instinct"].
What if this person chooses an English word as password? There will be
lg 50k ≈ 15.6 bits of entropy. That's low! And it assumes that users
choosing randomly over their entire vocabulary, which isn't likely
either.
The aforementioned NIST report uses the following heuristic for the
entropy of user-selected passwords drawn from the full keyboard:
- The entropy of the first character is 4 bits.
- The entropy of the next 7 chars is 2 bits per character.
- The entropy of the characters 9..20 is 1.5 bits per character.
- The entropy of characters 21+ is 1 bit per character.
- If the user is forced to use both upper-case and non-alphabetic,
characters, give a flat bonus of 6 bits of entropy.
- Given a bonus of 0..6 bits, usually about 4, for checking against
dictionary. This check must prevent simple transformations of any
word in an unabridged English dictionary of at least 50,000 words.
Other heuristics have been proposed, summarized in Schneider and in Bishop.
"Simple transformations" above could include deleting vowels,
capitalizing some letters, adding suffixes/ prefixes, replacing letters
with look-alike numbers, leet speak, and more.
**Beyond entropy.**
Weir et al. (2010) show experimentally that the NIST entropy estimates
don't do a good job of predicting how long it will take attackers to
crack passwords. Kelley et al. (2012) show that, despite the Weir et
al. result, passwords chosen according to the most comprehensive NIST
requirements (mixtures of characters kinds, no dictionary words,
sufficiently long, etc.) are indeed the passwords that are hardest to
crack—call these *comprehensive* passwords. So the NIST
recommendations reach the right conclusion, even if the metric they use
isn't valid. But comprehensive passwords are hard to remember and hated
by users, leading them to reuse passwords or predictably modify
passwords. Could we do better? Here are three options that have been
explored:
* **Mandatory randomness:** The user chooses part of the password
(perhaps poorly); the system chooses the other part of the password
(randomly). The system can mitigate weak choices by users, but there is
a danger that users begin choosing even weaker passwords in reaction to
the mandated randomness.
* **Passphrases:** Users choose long passwords but without any
requirements as to the kinds of characters used. Kelley et al. show
that passphrases are quite close in difficulty to comprehensive
passwords for cracking, and Komanduri et al. 2011 show that passphrases
are easier for humans to create and remember.
* **Password wallets or managers:** Users store their passwords in an
electronic wallet, which they open with a single master password. This
technique enables users to have many high-strength passwords while
having to remember only one.
### Beyond passwords
Could we replace passwords with a different authentication mechanism?
Bonneau et al. (2012) develop criteria against which to judge
proposed new mechanisms:
* Security:
- Physical observation: shoulder surfing, video recording, sound recording, thermal imaging
- Targeted impersonation: acquaintance or skilled investigator
- Online guessing: server constrains per-user rate of guess attempts
- Offline guessing: attacker's computational resources constrained
- Internal observation: attacker compromises channels, even any crypto on those channels (keyboard, SSL)
- Leaks: compromise at one account doesn't affect others
- Phishing: simulation of real server doesn't affect others
- Theft: physical object can't be used by another user
- Trusted third party: none
- Privacy: explicit consent, unlinkable
* Usability:
- Memoryless: humans don't have to remember secrets
- Scalable for users: any burden should scale to hundreds of accounts
- Nothing to carry: no hardware required, or at least not hardware user doesn't already always carry
- Physically effortless: no typing or physical motions, or at least not beyond pushing a button or speaking
- Easy to learn: no training or reminding
- Efficient: time to authenticate is short, and time to enroll is at least reasonable
- Infrequent errors: low false reject rate
- Easy recovery from loss: including latency, convenience, and assured recovery
* Deployability:
- Accessible: physical disabilities and conditions don't prevent use
- Cost: negligible per user, or at least plausible for startups with no per-user revenue stream
- Server compatible: nothing special required on server/verifier end
- Browser compatible: doesn't require (non-standard) plugins
- Mature: well tested and fielded beyond research
- Non-proprietary: published openly and not encumbered by IP
Evaluating many proposed schemes for replacing passwords, Bonnaeu et al. conclude
that though they generally offer better security, they tend to offer worse
deployability, and usability is sometimes better and sometimes worse. It
seems that passwords are here to stay, at least for now. Bonnaeu et al. observe
that most of the schemes that compare favorably to passwords involve
single sign on...
### Single sign on
With **single sign on** (SSO), a user enrolls with many service providers (SPs),
shares authentication secrets, e.g. password, with each SP, but authenticates
only once to the SSO service. Thereafter, the SSO manages authentication.
Note that the SSO can trivially impersonate the user: the SSO has to be trusted.
Variants of SSO include **true SSO**, in which the SSO does authentication
and the SPs simply trust the SSO when it asserts the identity of a use,
and **pseudo SSO**, in which the SSO impersonates the user to the SP
through the SP's own native authentication mechanism. Either way, the SSO
could be **local** to the user's machine or could be running as a remote
or **proxy** service.
Password managers are an example of a typically local pseudo SSO offering
a limited degree of automation. Browsers that remember passwords and synch
them across machines are an example of something approaching a proxy pseudo SSO.
Examples of proxy true SSOs include Kerberos and third-party authentication by
Google/Facebook credentials. Local true SSOs are harder to exemplify, as they
necessitate the remote SP trusting the user's machine not to lie about the user's
identity; a trusted cryptographic co-processor might be needed here to ensure
that the user cannot subvert the local SSO.