Certificates

Distributing public keys is a way to enable authentication of machines. Machine A sends a message to machine B, claiming the identity K_A, by signing with k_A. B verifies that signature, hence the identity. But this protocol requires B to know or to ascertain K_A.

One solution is a key server. As the name suggests, this is a server whose job is to distribute keys. Usually, it is a trusted third party that distributes secret (i.e., symmetric) keys. One very well-known solution is Kerberos. Another solution is distribution of certificates. Here, we'll cover two mechanisms for distribution.

Digital Certificates

A digital certificate (henceforth, certificate) is a signature Sign(id_S, K_S; k_I) on the pair of an identity id_S and public key K_S of a subject S. A certificate is created by an issuer I, whose signing key is k_I. The certificate binds id_S to K_S according to I. As a simpler notation for this idea, let I<<S>> be a certificate issued by I for S. Hence I<<S>> = Sign(id_S, K_S; k_I).

In practice, the most common format for certificates is X.509, an international standard. Beyond id_S, K_S, and the signature, an X.509 signature also contains a serial number, a validity interval, and several other data.

If machine A has acquired a certificate I<<S>>, and if A trusts I to issue that certificate, then A has a basis to verify S's identity:

Authentication with Certificates

S: Let m = "I'm id_S". Compute s = Sign(m; k_S).

S → A: m, s

A: Find a certificate I<<S>> from a trusted issuer I. Verify the signature of I on the certificate. Verify that the identifier in the certificate is id_S. Retrieve K_S from the certificate. Accept if Ver(m; s; K_S).

Note that one common implementation error is to forget to verify id_S. Indeed, sometimes crypto libraries will not do this for you by default.

The final step above requires verifying a signature on a certificate. If A already knows I's public key, that's not a problem. But if not, it seems like all we've done is to make a recursive call: to verify S's key, A needs to verify I's key. And that might involve finding and verifying even more certificates. Where does the recursion end? We need help managing certificates!

Systems for managing certificates are called public key infrastructures (PKIs). There are two philosophies for building PKIs:

Decentralized: anarchy, no publicly recognized leaders, e.g., PGP
Centralized: oligarchy, leadership of a few elite, e.g., CAs

PGP: Pretty Good Privacy

History: PGP, a program implementing encryption and digital signature schemes, was invented by Phil Zimmerman in 1991. He made it and source code publicly available. This was notable as the first time there had been popular open source for public-key crypto. The US Customs Service launched a criminal investigation for violation of export laws. High-strength crypto was classified as a munition, so there was some question that Zimmerman might have been guilty of arms trafficking. After three years, the investigation was dropped.

The philosophy of PGP is decentralized. Management of certificates devolves to users; anyone can certify anyone else and distribute that certificate as they wish.

With PGP, a machine M accumulate certificates that it trusts on a keyring. The machine's own key is on that keyring as a self-signed certificate M<<M>>. From time to time, M might synchronize with another machine N, importing keys from N's keyring.

When a machine acquires a certificate, it records how much trust to place in the issuer, hence in the certificate. The result is a web of trust: a machine can have multiple certificates for a given key from multiple issuers. There's no central repository of certificates that has to be trusted. So there's also no central service that could be taken down, DOS'd, coerced, seized by law enforcement, etc. Nonetheless, it's convenient to have such servers, and they do exist. MIT runs one, for example.

Example. (Although this example is phrased in terms of humans, it applies equally well to machines.) Suppose Bob sends Alice a message that he signs. Alice now needs to verify the signature, and to do that, she needs a certificate for Bob. She searches for certificates for Bob, and manages to find Dave<<Bob>>, Bob<<Bob>>, Carol<<Bob>>. Alice doesn't happen to know Dave, Bob, or Carol. So she searches around for certificates for them. She manages to find a certificate Ellen<<Carol>>, and realizes that Ellen is a friend of her that she trusts, and that Ellen's public key is already in her (Alice's) keyring, as Alice<<Ellen>>. So Alice has Alice<<Alice>>, Alice<<Ellen>>, Ellen<<Carol>>, Carol<<Bob>>. After verifying all those certificates, Alice can create a new certificate Alice<<Bob>>, put it in her keyring, and later distribute it to others. And, of course, she can verify Bob's signature on the message.

In the example above, the list of certificates is a certificate chain, starting with Alice and ending with Bob. At the root is a key that Alice trusts (her own), called the root of trust.

Note that it might not be possible to find a path through the web of trust that enables creation of a certificate chain.

CAs: Certificate Authorities

History: The earliest reference I can find is: [The Digital Distributed System Security Architecture. Morrie Gasser, Andy Goldstein, Charlie Kaufman, Butler Lampson. Proc. 12th National Computer Security Conf., NIST/NCSC, Baltimore, 1989, pp 305–319.] CAs were used in early browsers, such as Netscape's.

The philosophy behind CAs is to create principals called certificate authorities (CAs) whose raison d'être is to issue certificates. Everyone has the CA's own self-signed certificate CA<<CA>> pre-installed on their system. When machine S wants a certificate for itself, binding id_S to K_S, then S contacts the CA, enrolls, and the CA issues certificate CA<<S>> to S. When A wants to authenticate S, then S must present its certificate CA<<S>> to A.

Immediately we have two problems: (i) no one trusts a single CA anywhere in the world, and even if they did, (ii) the CA would be a single point of failure. So in reality there are many CAs—somewhere around 1500 have been observed on the public internet in recent years. Your university/company/bank could be a CA for its students/employees/customers. And there are commercial CAs that will issue certificates if you pay them enough—e.g., Verisign. Your browser actually comes with the certificates for those commercial CAs preinstalled. Sometimes if you pay the CA more, they do better verification of attributes at enrollment, and you get an enhanced certificate that causes the browser to display their name more prominently.

CAs as implemented in practice are hierarchical: a master CA will certify a subordinate CA as Master<<Sub>>, then the subordinate will issue certificates itself. And subordinates could themselves have sub's, etc. So we have certificate chains again—for example, Master<<Sub1>>, Sub1<<Sub2>>, Sub2<<Alice>>.

Revocation

Regardless of the distribution philosophy, certificates have a substantial problem when it comes to revocation: what happens when a certificate needs to be withdrawn. Revocation might happen periodically as a key's cryptoperiod expires, hence old certificates need to be replaced by new. Revocation might be necessary because the subject's private key was compromised and is no longer trustworthy, or maybe because the issuer's was.

Here are three strategies for implementing revocation:

Certificate revocation lists (CRL): the CA publishes lists of of revoked certificates. Machines download and check that list every time they need to verify a certificate. Downloading the list is expensive, so implementors will naturally cache, leading to TOCTOU attacks.
Online certificate validation: the CA stands up validation servers that accept certificates as input and determine whether they have been revoked. Machines contact one of these servers every time they need to verify a certificate. The validation servers now become a potential point of attack, and DOS against them leads to the inability to authenticate.
Fast expiration: the CA issues certificates only with short validity, e.g., 10 min to 24 hrs. CAs now have to issue certificates with much greater frequency, and machines frequently need to acquire new certificates.

Revocation is a serious practical problem. Consider, for example, what happens when an issuer's key is compromised. A large such breach occurred in March 2011. CA Comodo had nine rogue certificates issued through it for Gmail, Hotmail, Google, Yahoo, Skype, Firefox, etc. Whoever got those certificates could impersonate those businesses to anyone who doesn't know they're revoked. Browsers, OSs had to push out updates to revoke the certificates. So we're placing a lot of trust in CAs. Trust is an assumption, and assumptions are vulnerabilities...