CS 513 System Security -- Something You Know, Have, or Are

Something You Know, Have, or Are

Lecturer: Professor Fred B. Schneider

Lecture notes by Tom Roeder

Methods for authenticating people differ significantly from those for authenticating machines and programs, and this is because of the major differences in the capabilities of people versus computers. Computers are great at doing large calculations quickly and correctly, and they have large memories into which they can store and later retrieve Gigabytes of information. Humans don't. So we need to use different methods to authenticate people. In particular, the cryptographic protocols we've already discussed are not well suited if the principal being authenticated is a person (with all the associated limitations).

All approaches for human authentication rely on at least one of the following:

Something you know (eg. a password). This is the most common kind of authentication used for humans. We use passwords every day to access our systems. Unfortunately, something that you know can become something you just forgot. And if you write it down, then other people might find it.
Something you have (eg. a smart card). This form of human authentication removes the problem of forgetting something you know, but some object now must be with you any time you want to be authenticated. And such an object might be stolen and then becomes something the attacker has.
Something you are (eg. a fingerprint). Base authentication on something intrinsic to the principal being authenticated. It's much harder to lose a fingerprint than a wallet. Unfortunately, biometric sensors are fairly expensive and (at present) not very accurate.

We now explore each category in depth.

Something You Know

The idea here is that you know a secret --- often called a password --- that nobody else does. Thus, knowledge of a secret distinguishes you from all other individuals. And the authentication system simply needs to check to see if the person claiming to be you knows the secret.

Unfortunately, use of secrets is not a panacea. If the secret is entered at some sort of keyboard, an eavesdropper ("shoulder surfing") might see the secret being typed. For authenticating machines, we used challenge/response protocols to avoid sending a secret (key) over the wire where it could be intercepted by a wiretapper. But we can't force humans to engage in a challenge/response protocol on their own, because people cannot be expected to do cryptographic calculations.

Furthermore, people will tend to choose passwords that are easy to remember, which usually means that the password is easy to guess. Or they choose passwords that are difficult to guess but are also difficult to remember (so the passwords must be written down and then are easy for an attacker to find).

Even if a password is not trivial to guess, it might succumb to an offline search of the password space. An offline search needs some way to check a guess without using the system itself, and some methods used today for storing passwords do provide such a way. (See below.)

Finally, changing a password requires human intervention. Thus, compromised passwords could remain valid for longer than is desirable. And there must be some mechanism for resetting the password (because passwords will get forgotten and compromised). This mechanism could itself be vulnerable to social-engineering attacks, which rely on convincing a human with the authority to change or access information that it is necessary to do so.

With all these concerns about passwords, you might wonder what is required for a password to be considered a good one. There are three dimensions, and they interact so that strengthening one can be used to offset a weakness in another.

Length. This is the easiest dimension for people to strengthen. Longer passwords are better. A good way to get a long password that is seemingly random yet easy to remember is to think of a passphrase (like the first words of a song) and then generate the password from the first letters of the passphrase.
Character set. The more characters that can be used in a password, the greater the number of possible combinations of characters, so the larger the password space. To search a larger password space require doing more work by an attacker.
Randomness. Choose a password from a language (English, say) and an attacker can leverage regularities in this language to reduce the work needed in searching the password space (because certain passwords are now "impossible"). For instance, given the phonotactic and orthographic constraints of English, an attacker searching for an English word need not try passwords containing sequences like krz (although this would be a perfectly reasonable to try if the password was known to be in Polish). Mathematically, it turns out that English has about 1.3 bits of information per character. Thus it takes 49 characters to get 64 bits of "secret", which comes out to about 10 words (at 5 characters on average per word).

When passwords are used for authenticating a user, the system must have a way to check whether the password entered is valid. Simply storing a file with the list of usernames and associated passwords, however, is a bad idea because if the confidentiality of this file were ever compromised all would be lost. (Similarly, backup copies of this file would have to be afforded the same level of protection, since people rarely ever change their passwords.) Better not to store actual passwords on-line. So instead we might compute a cryptographic hash of the password, and store that. Now, the user enters a password; the system computes a hash of that password; and the system then compares that hash with what has been stored in the password file.

Even when password hashes instead of actual passwords are what is being stored, the integrity of this file of hashes must still be protected. Otherwise an attacker could insert a different hash (for a password the attacker knows) and log into the system using that new password.

The problem with having a password file that is not confidential --- even if cryptographic hashes are what is being stored --- is the possibility of offline dictionary attacks. Here, the attacker computes the hash of every word in some dictionary and then compares each hash with the stored password hashes. If any match, the attacker has learned a password. An alternative to confidentiality for defending against offline dictionary attacks is use of salt. Salt is a random number that is associated with a user and is added to that user's password when the hash is computed. With high probability, a given pair of users will not have the same salt value. And the system stores both h(password + salt) and the salt for each account.

Salt does not make it more difficult for an attacker to guess the password for a given account, since the salt for each account is stored in the clear. What salt does, however, is make it harder for the attacker to perpetrate an offline dictionary attack against all users. When salt is used, all the words in the dictionary would have to be rehashed for every user. What formerly could be seen as a "wholesale" attack has been transformed into a "retail" one.

Salt is used in most UNIX implementations. The salt in early versions of UNIX was 12 bits, and it was formed from the system time and the process identifier when an account is created. Unfortunately, 12 bits is hopelessly small, nowadays. Even an old PC can perform 13,000 crypt/sec, which means such a PC so can hash a 20k word dictionary with every possible value of a 12 bit salt in 1 hour.

Secret Salt

Another defense against offline dictionary attacks is to use secret salt (invented by Manber and independently by Abadi and Needham). In this scheme, we select a small set of possible "secret salt" values from a large space. The password file then stores for each user: userid, h(password, public salt, secret salt), public salt. Note that the value of the secret salt used in computing the hash is not saved anyplace. When secret salt is being employed, a user login involves having the system guess the value of secret salt that was used in computing the stored, hashed password; the guess involves checking through the possible secret salt values. The effect is to make computing a hashed password very expensive for attackers.

Examples of Password Systems

We now outline several widely-used password systems.

Unix. Unix stores a hashed salted password and salt. For the hash, it iterates DES 25 times with an input of "0" and with the password as the key; it then adds the 12-bit salt. As discussed above, this is not strong enough for today's machines. Some versions of Unix employ a shadow password file, so that it is harder for an attacker to retrieve the hashed passwords. There are then two files: /etc/shadow and /etc/master.password.
FreeBSD. FreeBSD stores a hashed password (where the hash is based on MD5). There is no limit to the length of the password, and 48 bits of salt are used.
OpenBSD. OpenBSD does a hash based on blowfish encryption, and then stores the hashed password along with 128 bits of salt. The system guarantees that no two accounts will have the same salt value.
Windows NT/2000/XP. NT stores 2 password hashes: one called the LanMan hash and another called the NT hash. The LanMan hash is used for backwards compatibility with Windows 95/98, and it is a very weak scheme. The following diagram shows how it works.

To see the weakness, consider how much work an attacker would have to do to break this scheme. The numbers and uppercase letters together make up 36 characters. Each half of a 14-character password then has 36⁷ possible values, which comes out as 78,364,164,096. The actual work factor then is 2 x 36⁷ (whereas the theoretical work factor for 14 characters is 36¹⁴ = 36⁷ x 36⁷).

Note that if upper and lower case were both allowed, then there would be (2 x 26) + 10 = 62 possible characters and thus 62⁷ = 3,512,614,606,208 possible values, which is 100 times greater than the LanMan value.

The NT hash is somewhat better. In the NT operating system, there was still a 14 character limit, although this limit was removed in Windows 2000 and XP. The password is then passed through 48 iterations of MD4 to get a 128 bit hash. This hash is stored in the system, but no salt is used at all.

Defense Against Password Theft: A Trusted Path

Given schemes that make passwords hard to guess, an attacker might be tempted to try theft. The attack is: install some some sort of program to produce a window that resembles a login prompt or otherwise invites the user to reveal a password. Users will then type their passwords into this program, where the password is saved for later use by the attacker.

How can you defend against such attacks? What we would like is some way for a user to determine the pedigree of any window purporting to be a login prompt. If each point in the pedigree is trusted, then the login prompt window must be trusted and it is safe to enter a password. This idea is called a trusted path.

To implement a trusted path, the keyboard driver recognizes a certain key sequence (Ctl-Alt-Del in Windows) and always then transfers control to some trusted software that displays a (password prompt) window and reads the contents. Users are educated to type passwords only into windows that appear after typing that special key sequence.

Notice, however, that this scheme requires that a trusted keyboard driver is executing. So, that means the system must be running an operating system that is trusted to prevent keyboard driver substitutions. One might expect that rebooting the machine would be a way to ensure that a trusted operating system is executing (presuming you trust whatever operating system is installed), but what if the OS image on the disk had been altered by an attacker? So, one must be certain that the operating system software stored on the disk has not been modified, too. But even that's not enough. What about the boot loader, which might have been altered to read a boot block from a non-standard location on the disk? And so it goes. Even if you start each session by booting from your own fresh OS CD, a ROM or even the hardware might have been hacked by an attacker. Physical security of the hardware then must also have been maintained. In the end, though, to the extent that you can trust all layers from the hardware to the keyboard driver, the resulting trusted path provides a way to defend against attacks implemented by programs that attempt to steal passwords by spoofing.

Something You Have

Instead of basing authentication on something a principal knows and can forget, maybe we should base it on something the principal has. Various token/card technologies support authentication along these lines. For all, 2-factor authentication becomes important --- an authentication process that involves 2 independent means of authenticating the principal. So, we might require that a principal not only possess a device but also know some secret password (often known as a PIN, or personal identification number). Without 2-factor authentication, stealing the device would allow an attacker to impersonate the owner of the device; with 2-factor authentication, the attacker would still have another authentication burden to overcome.

Here are examples of technologies for authentication based on something a principal might possess:

A magnetic strip card. (eg. Cornell ID, credit card) One serious problem with these cards is that they are fairly easy to duplicate. It only costs about $50 to buy a writer, and it's easy to get your hands on cards to copy them. To get around these problems, banks implement 2-factor authentication by requiring knowledge of a 4 to 7 character PIN whenever the card is used.

Short PINs are problematic. First, they admit guessing attacks. Banks defend against this by limiting the number of guesses before they will confiscate the card. Second there is the matter of how to check if a PIN that has been entered is the correct one. Storing the PIN on the card's magnetic stripe is not a good idea because a thief who steals the card can easily determine the associated PIN (and then subvert the 2-factor authentication protocol). Storing an encrypted copy of the PIN on the card's magnetic stripe does not exhibit this vulnerability, though.
Proximity card or RFID. These cards transmit stored information to a monitor via RF. There is currently a debate in this country as to the merits of using RF proximity cards (RFID tags) for identification of people and products. Walmart speaks about puttung RFID tags on every product they shelve, and both the German and U.S. governments are including them in passports. With RFID tags on Walmart products, for example. then somebody with a suitable receiver could tell what you have purchased (even though your purchase is hidden in a bag) --- and this is seen by some as a privacy violation. With RFID tags in passports, somebody with a suitable receiver could remotely identify on the street citizens of a given country and single them out for "special treatment" (likely unpleasant).

There are two types of RF proximity cards: passive and active. The former is not powered, and use the RF energy from the requester to reply with whatever information is being stored by the card. The latter is powered and broadcasts information, allowing anyone who is in range and has a receiver to query the card. You could imagine that if RF tags are put into passports, then some people might start carrying them in special Faraday-cage passport holders, because now an interloper can learn about someone without the victim's knowledge (or permission).
Challenge/Response cards and Cryptographic Calculators. These are also called smart cards and perform some sort of cryptographic calculation. Sometimes the card will have memory, and sometimes it will have an associated PIN. A smart card transforms the authentication problem for humans, because we are no longer constrained by stringent computational and storage limitations. Unfortunately, today's smart cards are vulnerable to power-analysis attacks. Furthermore, one must exercise care in using a cryptographic calculator --- if it is used to generate digital signatures, for example, then somehow the device owner must be made aware of what documents are being signed.

One prevalent form of smartcard is the RSA secure id. It continuously displays encrypted time; and each RSA secure id encrypts with a different key. Whoever has an RSA secure id card responds to server challenges by typing the encrypted time (so, in effect, it is secret) --- a server, knowing what key is associated with each user's card, can then authenticate a user. (The server must be somewhat generous with respect to what times it will accept. Accept too many and replay attacks become possible; accept too few and message delivery delays and execution times prevent people from authenticating themselves).

Something You Are

Since people forget things and lose things, one might contemplate basing an authentication scheme for humans on something that a person is. After all, we recognize people we interact with not because of some password protocol but because of how they look or how they sound --- "something they are". Authentication based on "something you are" will employ behavioral and physiological characteristics of the principal. These characteristics must be easily measured accurately and preferably are things that are difficult to spoof. For example, we might use

Retinal scan
Fingerprint reader
Handprint reader
Voice print
Keystroke timing
Signature

To implement such a biometric authentication scheme some representation for the characteristic of interest is stored. Subsequently, when authenticating that person, the characteristic is measured and compared with what has been stored. An exact match is not expected, nor should it be because of error rates associated with biometric sensors. (For example, fingerprint readers today normally exhibit error rates upwards of 5%.)

Methods to subvert a fingerprint reader give some indication of the difficulties of deploying unsupervised biometric sensors as the sole means of authenticating humans. Attacks include:

Steal a finger. Difficult to do without the owner of the finger noticing. Good supervision of the biometric sensor defends against this attack.
Steal a fingerprint. Lifting a fingerprint is not that hard (at least, according to those TV crime-drama shows). Again, though, good human supervision of the biometric sensor defends against this attack because a guard will notice if somebody is not inserting a naked finger into the reader.
Replace the biometric sensor. At first glance, this type of attack might seem even more difficult to execute than the two above. Social enginnering might be easier for the attacker to employ, here, though. It suffices that the guard believe that the senor should be changed (maybe because the the old one is "broken").

There are several well known problems with biometric-based authentication schemes:

Reliability of the method. Similarity of physical features (faces, hands, or fingerprints) and inaccuracy of measurement may together conspire to create an unacceptably high false acceptance rate (FAR).
Cost and availability. Currently, some readers cost $40-50 and more. Are end users willing to pay that much for an authentication method that does not work as well as passwords?
Unwillingness or inability to interact with biometric input devices. Some people are uncomfortable putting a body part into a machine; some are uncomfortable having lasers shined in their eyes for a retinal scans; and some don't have fingers or eyes to be measured.
Compromise the biometric database or system. It might be possible to circumvent the system's biometric sensor and provide an "input" from another source. The sensor is, after all, connected to a system and hijacking that channel might be possible. Knowledge of the stored representation for a characteristic would then allow an attacker to inject the correct characteristic and impersonate anyone.
Revocation. What does it mean to revoke a fingerprint?

The literature on biometric authentication uses the following vocabulary to characterize what a scheme does and how well it works:

FAR: (false acceptance rate). This is the probability that the system will fail to reject an impostor (aka FMR: false match rate)

FRR: (false reject rate). This is the probability that the system will reject a bona fide principal. (aka FNMR: false non-match rate)

One-to-one matching: Compare live template with a specific stored template in the system. This corresponds to authentication.

One-to-many matching: Compare live templates with all stored templates in the system. This corresponds to identification.

Summary

Having looked at all these methods for authentication, we can see that as a secondary form of authentication (but not identification!) biometrics might be promising. The most likely form of authentication in the future, however, will be a combination of something you have and something you know. Passwords will be around for a long time yet.