CS 5430: Passwords

# Passwords "Something you know" refers to authentication based on a human's knowledge of a secret. The secret is usually a PIN (short numeric code), password (short string), or passphrase (longer string); these differ primarily just in length. So henceforth I write "password", but what I discuss is relevant to all three. Passwords have a life cycle: 1. **Create:** user chooses password 2. **Store:** system stores password with user identifier 3. **Use:** user supplies password to authenticate 4. **Change (Recover/Reset):** user forgets; wants or needs to change Let's discuss these stages of the life cycle in reverse order. ## Stage 4: Change Users forget passwords. Systems (perhaps unwisely) cause passwords to expire. Users want to change passwords. Passwords sometimes get disclosed to attackers. How might that happen? They could be found on post-it notes, revealed by the user themself (social engineering), guessed by the attacker (online guessing), or *cracked* by the attacker who obtains the password database (offline guessing). So recovery and change are important parts of the password lifecycle. Unfortunately, these parts tend to receive less attention than the rest of the system and consequently might be poorly designed and tested. That makes them attractive targets for attacks, rather than going after the (presumably well-engineered) primary authentication mechanism. Recognize that recovery is the authentication problem all over again: the system must authenticate user by some other means than a password. Standard solutions on the web today exemplify that: - **Answers to security questions:** this is "something you know", using attributes of your identity established during enrollment. It trusts that other humans can't masquerade as you—which is likely not true! - **Sending an email with a link:** this is "something you know", using a (one-time) password that expires immediately after use. It trusts that your email account correctly authenticates you. ## Stage 3: Use **Prompting for passwords.** Implicitly, there's always some kind of *mutual* authentication going on. Before the human enters a password, she has to decide whether to enter it. (Many people don't think about this very carefully, contributing to the success of phishing attacks.) Some systems go to greater lengths to achieve mutual authentication: - **Windows** has a *secure attention sequence* that provides a *trusted path* to the OS: only Windows itself can receive the Ctrl+Alt+Del key sequence. That makes it difficult for malicious code to pretend it's the OS and spoof the login prompt. - **Banks** sometimes use a *visual secret* technology. The user enters a username, the bank displays that username's chosen secret, then the user enters a password. The idea is to prevent phishing by causing users to abort if the picture isn't right. Visual secrets are trying to solve a UI problem in the browser—enabling the user to easily identify the remote server. Users are quite bad at detecting counterfeits. *Assumption: Users can't discern whether they're really interacting with their bank or not—for example, by looking at the browser's title bar.* Consider a man-in-the-middle (MitM) attack, in which the attacker interposes between the human and bank. The attacker learns the human's username, forwards it on the bank, receives back the user's visual secret, and displays it to the user. The user then enters her password for the attacker. GAME OVER. What would prevent this attack? Only the human noticing that website isn't really the bank's. But that contradicts our initial assumption. So visual secrets don't actually, fully prevent phishing attacks. Visual secrets do raise the bar, though, by making it (a little) harder for attackers to mount phishing attacks. **What should happen when password-based authentication fails?** I.e., how should an application react in response to successful or failed authentication? It could be useful to tell successfully authenticated users when they last logged in, the number of failed attempts since then, and maybe even when those attempts were. When authentication fails, the identity might be under online attack. Possible responses include rate limiting future attempts to authenticate under that identity, and eventually disabling the identity (though that creates an availability attack). Informative error messages tend to hurt security here. A related question is: should an application prompt for a username first, or at same time as the password? Prompting first creates a new vulnerability, in that it enables attackers to guess valid usernames. So it's better to prompt at the same time, and not give any indication of which was wrong, the username or the password, if authentication fails. (Note how visual secrets might disobey this rule.) ## Stage 2: Store After creating a password, both the human and system must store it. **Humans** don't have the storage capacity to remember lots of strong passwords. So they naturally do a mixture of - *reusing* passwords across various identities (with the consequence that an attacker who learns password from one site can impersonate human on others), and - *recording* passwords in written or electronic form (with the consequence that an attacker can search for that record rather than brute-forcing the password). The classical example of bad recording is writing down passwords on post-it notes next to monitors. But there are some better ways of managing recording. Humans are pretty good at securing pieces of paper, called *cash*, in their wallet. So recording passwords in your wallet might help. (Though loss of your wallet now has even greater impact than before.) Various electronic password wallet applications are also available, and can be very convenient to use—even making it easy to use random passwords, one per identity. But those apps (and their data, if any) now become targets. **Systems** store passwords in a file (or a database). If that file is sufficiently secured with appropriate access control mechanisms, then storing plaintext passwords could in principle be okay. But real-world experience teaches us that password files get leaked. So for Defense in Depth, don't store plaintext passwords! How can we protect passwords such that, even if the password file is disclosed, attackers don't learn users' passwords? We want a function f such that, if we store f(pass) for a password pass, then it's hard for attackers to invert f(pass) and find pass. When we studied cryptography, we called that property *one way*, and we saw that cryptographic hash functions provide that property. (Encryption does too, but encryption requires a key, whereas hashes do not.) **Hashed Passwords**: For each user with username `uid` and password `pass`, store `uid, H(pass)`. To authenticate a human Hu to a local machine L: ``` 1. Hu -> L: uid, pass 2. L: if H(pass) is stored as uid's password, then uid is authenticated ``` During authentication, the Hu is interacting with the local machine directly (e.g., by a keyboard), not over a network. With this scheme, if the attacker obtains the password file, he doesn't immediately learn the plaintext passwords. But it's not a perfect solution, because, besides inverting the hash function, there's another way to learn passwords. An attacker can construct a *dictionary* mapping plaintext passwords from their hashes: ``` H(pass1), pass1 H(pass2), pass2 H(pass3), pass3 ... ``` Given such a dictionary, and given a hashed-password file, the attacker can simply look up the hashed passwords and find the plaintext passwords. So the question becomes, how hard is it to construct a dictionary? It turns out it's not all that hard. It just takes time and space. It's even possible to buy or download free, precomputed tables. Part of the reason why it's not hard to construct a dictionary is that the set of passwords likely to be chosen by a human is much smaller than the set of all possible passwords. According to [Bruce Schneier](https://www.schneier.com/essay-148.html) (2007), quoting Eric Thompson of AccessData, 1. The typical password comprises a root followed by an appendage. 2. The root is typically pronounceable, though not necessarily a dictionary word 3. The appendage is usually a suffix (90%) or prefix (10%). 4. Most users have a 7–9 character root plus a shorter common appendage. 5. A dictionary of 1,000 roots plus 100 suffixes (= 100k passwords) cracks about 24% of all passwords. 6. More sophisticated dictionaries (including initial/final uppercase, and [leet speak](http://en.wikipedia.org/wiki/Leet) substitutions) crack about 60% of passwords within 2–4 weeks. 7. Given biographical data (zip code, names, etc.) and other passwords of a user, the success rate goes up a little, and the time goes down to days or hours. What countermeasures can defend against *dictionary attacks*? **Idea 1: Slow down.** One reason that it's easy to construct dictionaries is that cryptographic hash functions are designed to be quick to compute. A hash function that was slow to compute would cause table construction to take more time, making it more difficult for attackers to succeed. There's a function called "scrypt" that is designed in just this way. Another means to slow down computation is to iterate the hash function, perhaps a thousand times: ``` z1 = H(pass); z2 = H(pass, z1); ... z1000 = H(pass, z999); output z1 XOR z2 XOR ... XOR z1000 ``` (This algorithm is almost the same as PBKDF2 from [RFC 2898](http://www.ietf.org/rfc/rfc2898.txt).) Another term for this technique is *key stretching*. The number of times to iterate can be a parameter. Over the lifetime of a system, the parameter can be adjusted upwards to account for increasing attacker computational power. When PKBDF2 was proposed in 2000, a minimum of 1,000 iterations was suggested. As of 2013, 10,000 is a minimum, and 20,000–40,000 is not unreasonable. Arguably, more iterations than that [would not be as useful][agilebits] as requiring a single extra randomly chosen character in the password. [agilebits]: http://blog.agilebits.com/2013/04/16/1password-hashcat-strong-master-passwords **Idea 2: Add salt.** Dictionary attacks succeed because the password space is so small. If we increased the size of that space we could make the table size infeasibly big to compute or to store. The trick is to do so without making users choose longer passwords. We can do that by introducing a nonce, which is commonly called *salt*. Every user is assigned their own unique salt. For each user with username `uid` and password `pass`, store `uid, salt, H(pass, salt)`. To authenticate a human Hu to a local machine L: ``` 1. Hu -> L: uid, pass 2. L: lookup salt for uid; if H(pass, salt) is stored as uid's password, then uid is authenticated ``` Note that salt doesn't need to be unpredictable, just unique. But to prevent password cracks on one system from being effective on another system, salt does need to be unique across systems. Generating the salt randomly is the easiest way to achieve that. To combine salt with iterated hashing, just salt the first hash: ``` z1 = H(pass, salt); z2 = H(pass, z1); ... ``` PBKDF2 [[RFC 2898](https://tools.ietf.org/html/rfc2898)] is an algorithm that, as commonly used, does salted iterated hashing. It can be used to prepare passwords for storage, or to derive a cryptographic key from a password. WiFi WPA2 uses PBKDF2 to derive long-term keys from passphrases; the network's SSID is used as the salt. ## Exercises 1. Find out, if possible, how your OS stores your password. 2. Consider the following three protocols for authenticating a human Hu to a remote machine S using a password entered at a local machine L. ``` PROTOCOL 1 1. Hu -> L: uid, pass 2. L -> S (over unsecured channel): uid, H(pass) 3. S: if H(pass) is stored as uid’s password, then uid is authenticated PROTOCOL 2 1. Hu -> L: uid, pass 2. L and S: establish secure channel 3. L -> S (over secured channel): uid, pass 4. S: if H(pass) is stored as uid’s password, then uid is authenticated PROTOCOL 3 1. Hu -> L: uid, pass 2. L and S: establish secure channel 3. L -> S (over secured channel): uid, H(pass) 4. S: if H(pass) is stored as uid’s password, then uid is authenticated ``` * Explain which of the three protocols are vulnerable to replay attacks, and which are not. * Compared to Protocol 2, is there any advantage or disadvantage to hashing the password in Protocol 3? 3. A data structure called a *rainbow table* can be used to make password cracking more efficient. Read about the [theory][oech03] of rainbow tables, then download the [RainbowCrack][rainbowcrack] tool. Use the tool to crack this password: `efe8b71fbb429db5a78c193110e4edf1bedd6315`. That password is a SHA-1 hash of an unsalted plaintext that is at most 10 characters in length, and every character is a lowercase letter. The first student to approach Prof. Clarkson *in the lecture room* and give the password will get a gift certificate to Gimme for a free coffee. 4. [S5] discusses *pepper*, which is a technique that is complementary to salt. Explain the tradeoffs involved with pepper. Most systems do not use it; why do you think that is? 5. As custom hardware (ASICs, GPUs) make it possible to quickly compute many hashes in parallel, time is no longer necessarily the bottleneck. So hash designers are turning to space as another resource that can be used to slow down hashes. Recent algorithms such as scrypt and Argon2 use this idea. Do some research to find out how one of those algorithms works. [oech03]: https://lasec.epfl.ch/pub/lasec/doc/Oech03.pdf [rainbowcrack]: http://project-rainbowcrack.com/ [S5]: http://www.cs.cornell.edu/fbs/publications/chptr.AuthPeople.pdf