# MACs and Digital Signatures
We now turn our attention to protecting the integrity of messages with cryptography.
**Threat:** A Dolev–Yao attacker.
**Harm:** The information contained in messages could be modified, thus
violating integrity.
**Harm:** The purported sender of a message could be changed, thus
violating integrity.
**Vulnerability:** Messages sent on the communication channel between
the sender and receiver can be modified by untrusted principals.
**Countermeasure:**
Like encryption, there are symmetric and asymmetric algorithms for
protecting integrity. The symmetric version is called a *message
authentication code* (MAC). The asymmetric version is called a *digital
signature*.
## Encryption and integrity
Encryption is designed to protect confidentiality. There are some block
modes designed in last decade to protect both confidentiality and
integrity. But encryption does not, in general, protect integrity. The
usual mistake is to reason as follows: "The message is encrypted. If
attacker changes the ciphertext, it will decrypt to nonsense. That
nonsense can be detected." But that reasoning is not valid. For example:
- Attackers can be smart about changes. A ciphertext from another
execution of the same protocol might decrypt just fine.
- The plaintext block could itself be a random number, and recipient
would have no way of determining whether it's the right one if the
attacker substitutes a different random number.
- In CTR mode (or any *stream cipher*), it's easy to flip individual
bits. E.g., change "admin=0" to "admin=1" just by knowing position
of that bit in stream.
- In CBC mode, it's easy to truncate blocks from the beginning of a
message.
For more examples, see section 9.6.5.i, "Encryption alone does not
guarantee data integrity", in [HAC][hac].
[hac]: http://cacr.uwaterloo.ca/hac/
## Cryptographic hash functions
Both MACs and digital signatures use another primitive that we'll cover
first: a *cryptographic hash function*, also called a *message digest*,
which takes an arbitrary size input m and produces a fixed length output H(m).
The output length is typically 128–1024 bits. The goal of a
cryptographic hash is to produce a compact representation of an original
object. That representation should behave much like a fingerprint:
- It's hard to find 2 people with same fingerprint. That's true
whether you get to pick pairs of people, or whether you are given
one person then must find another. That means fingerprints are
**collision resistant**.
- Given a person, it's easy to get their fingerprint. But given a
fingerprint, it's hard to find the person it came from. (Which is
why law enforcement invests money in building databases to do just
that.) That means fingerprints are **one way**.
Likewise, cryptographic hash functions must be collision resistant and
one way. Cryptographic hash functions are not the same as the
ordinary hash functions that are used to implement hash tables, even
though both compress their inputs. Collision resistance and one way-ness
are not required of ordinary hash functions.
The strength of a hash function is, in the absence of any clever
attacks, half the function's output length. E.g., if the output length
is 256 bits, then the strength is at most 128 bits. Why? There's a
generic attack that works on all hash functions that halves the security
level. It's called the *birthday attack*.
MD5 and SHA-1 used to be the most commonly used hash functions. But:
- The collision resistance of MD5 (invented by Ron Rivest in 1991) was
broken in 2004–8. It's now possible to find collisions in mere
seconds. Moreover, the collisions can even be engineered to be
(maliciously) useful, for example, [generating rogue CA
certificates](http://www.win.tue.nl/hashclash/rogue-ca/).
- The collision resistance of SHA-1 (released by the NSA in 1995) has
been broken in ongoing work since 2005. Attacks are known that
reduce its strength to only 65 bits or fewer.
SHA-2, released by the NSA in 2001. is actually a whole family of
algorithms, SHA-{224,256,384,512}. The name indicates the output size in
bits. Each should have security level equal to its output size halved.
But these are based on similar ideas to SHA-1, so there's concern that
they might one day turn out to be vulnerable to similar attacks.
SHA-3 was released in 2015. NIST held a public competition for the new
algorithm. There were five finalists, all based on different ideas than
SHA-1 and SHA-2, and all developed openly and peer reviewed. The output
size can be 224, 256, 384, or 512 bits; or a variable-length output can
be produced using a variant called SHAKE.
## Message authentication codes
A *message authentication code* (MAC) is an algorithm for detecting
modification of messages based on a shared key.
```
0. k = Gen(len) // A and B somehow share key k
1. A: t = MAC(m; k) // t is called the "tag"
2. A -> B: m, t
3. B: verify t = MAC(m; k)
```
The length of input m to MAC may be arbitrary. The output length of MAC
is fixed and depends upon the particular MAC algorithm.
When is a MAC secure? It should behave like a *random function*, for each
key. Especially, it shouldn't be possible to predict new (m,t) pairs if
you don't know k.
There are many examples of MACs. HMAC (a hash-based MAC) is one of the
most common. It is parameterized on a cryptographic hash function,
which can be instantiated, for example, by any of the SHA family.
Another example is CBC-MAC, which is parameterized on a block cipher
and uses that block cipher in CBC mode.
## Digital signatures
A *digital signature scheme* is a set of algorithms for detecting
modification of messages based on an asymmetric key pair. The public
key for principal A, written K_A, is used to verify A's signatures. The
private key for principal A, written k_A, is used by A to create
signatures.
```
0. (K_A, k_A) = Gen(len)
1. A: s = Sign(m; k_A)
2. A -> B: m, s
3. B: accept if Ver(m; s; K_A)
```
The digital signature scheme is the triple (Gen, Sign, Ver) of
algorithms. Note that Ver takes three inputs: the message to verify, the
purported signature on that message, and the verification key of the
signer.
As with MACs, we want to be able to sign arbitrary length messages. But
these Sign and Ver are public-key algorithms, which operate on big
integers. So, as with public-key encryption, they are constrained to a
limited input size. In practice, messages are therefore hashed before
being signed:
```
0. (K_A, k_A) = Gen(len)
1. A: s = Sign(H(m); k_A)
2. A -> B: m, s
3. B: accept if Ver(H(m); s; K_A)
```
Hashing is such a pervasive practice with signatures that, henceforth,
we'll just assume the message is hashed without bothering to write that
down as part of the protocol.
When is a digital signature scheme secure? It should work like
hand-written signatures. In fact, it should be even better: an adversary
shouldn't be able to forge signatures on new messages, even if given
samples of other signed messages.
Well-known examples of digital signature schemes include the following:
- **Digital Signature Algorithm (DSA).** Released by NIST in 1991 as
part of the Digital Signature Standard (DSS). There is no proof of
security for DSA, but it's been used for decades now without any
serious attacks being discovered. Originally, the standard required
SHA-1 as the hash, but that's since been updated. Also, the
keys were originally short, but have since been lengthened.
- **RSA.** The RSA encryption scheme can be adapted for use as a
digital signature scheme. The Gen algorithm stays the same. Sign is
roughly equivalent to Dec, and Ver to Enc. However, they aren't
exactly the same. Sometimes people will say that you should "do an
RSA decryption to sign". That's partially correct, but also
partially wrong. Here is a set of notes explaining [why RSA
decryption is not the same as RSA signing][rsa-dec-sign].
Also, **never use the same key pair for both signing and
encryption**. There's a provably secure variant of RSA signatures
called RSA-PSS (*probabilistic signature scheme*).
[rsa-dec-sign]: http://www.cs.cornell.edu/courses/cs5430/2015sp/notes/rsa_sign_vs_dec.php
## Exercises
1. A novice cryptographer "discovers" a much faster method of generating MACs
for long messages. The idea is to simply compute HMAC of the first 64
bits of the message, and use that as the MAC. What's wrong with this
idea?
2. Consider this MAC algorithm: the tag is a SHA-256 hash of the message.
Why is this MAC algorithm insecure?
3. Investigate either MD5 or SHA-1. Why is your chosen algorithm deemed
insecure for use today? What are the feasible attacks on it?
4. *Repudiation of origin* occurs when a principal denies that it
created a message. For example:
A customer sends a letter to a vendor agreeing to pay a large amount of
money for a product. The vendor ships the product then demands payment.
The customer denies having ordered the product and, by law, is
therefore entitled to keep the unsolicited shipment without payment. If
the vendor can't prove that the letter came from the customer, then the
customer has repudiated the origin of the letter.
A. Consider a symmetric-key encryption scheme. Bob and Alice share a key k.
Bob has a message m that he claims came from Alice, and to prove that he
shows both m and the ciphertext c, where c = Enc(m; k). Can Alice
repudiate Bob's claim of origin? Why or why not?
B. Now consider a digital signature scheme. Alice has a key pair (K_A,
k_A). Bob has a message m and signature s that he claims came from
Alice, where Ver(m; s; K_A) returns true. Can Alice repudiate Bob's
claim of origin? Why or why not?
5. Research both OAEP and PSS. What is the "padding" tranformation each applies to
a message to prepare it for encryption or signing? How does the transformation
differ between the two algorithms?
6. Read Chaum's [original paper on untraceable payments][chaum83] (PDF at that link
accessible from an on-campus Cornell IP address). Explain how why the bank cannot
determine the payer of a note when the bank receives the note from the payee.
Explain how the bank prevents a note from being spent twice.
[chaum83]: http://link.springer.com/chapter/10.1007%2F978-1-4757-0602-4_18