RMI and security

Due date: Friday, December 3rd

Note: you can download the required version of the storage server code from here:
storage-3-2.zip

What you need to do

In this project, you will add three features to the storage server

the use of Java's Remote Method Invocation (RMI) mechanism to implement client-server communication
the ability to execute client code on the storage server
security

Remote method invocations automate much of the tedious work that has now been performed manually in the storage server. RMI also provides the necessary mechanism to execute client code on the storage server. The idea behind moving client computations to the storage server is that it is sometimes less expensive to move a computation to storage units than it is to move the strorage units to the client. Security has been a missing feature in all previous storage server implementations; allowing client code to execute on the server only increases the need for security mechanisms.

RMI

The current implementation of the storage server uses a very simple Remote Procedure Call (RPC) implementation. The implementation is simple for three reasons:

The implementation includes hand-written code to convert objects that must travel to the server (and back) to an array of bytes that can be transported across the network. Converting data objects into a flat byte stream is called marshaling; the reverse process is called unmarshaling. Full-fledged RPC systems provide a stub compiler that can generate marshaling and unmarshaling code from an interface description.
The network address of the server is more or less hardwired into the client code, which is undesirable. Normally, the server address is obtained from a name server through a process called binding.
The storage server relies on the User Datagram Protocol (UDP) to deliver request and reply messages. UDP, however, is an unreliable protocol. RPC systems deliver messages reliably.

Java's RMI mechanism is an object-oriented flavor of RPC. The RMI distribution includes a stub compiler, rmic,which you can use to generate marshaling and unmarshaling code. In RMI's terminology, marshaling is called serialization and unmarshaling is called deserialization. Binding takes place by means of a server called the RMI registry. Internally, RMI uses TCP/IP to transport messages reliably.

Briefly, RMI works as follows. The implementor of the server writes a Java interface which defines the methods that the server exports to its clients. The implementor also provides a class that implements this interface. Finally, the implementor generates marshaling and unmarshaling code using the stub compiler. The service can now be made available by creating an instance of the server class and by registering this instance with the RMI registry under a well-known name. (This is done using Naming.rebind().)

Clients can query the RMI registry for the name of the service (using Naming.lookup()). If the registry knows the service, it will return to the client an object that implements the service interface. Such an instance is, quite appropriately, called a remote object. The client can invoke the methods defined in the server interface on the remote object. The method's actual parameters will automatically be marshaled into a message and transmitted to the server. At the server, a (popup!) thread is created that unmarshals the message and invokes the server's implementation of the method. After the method has executed, the return value is marshaled into a reply message which is sent back to the client. At the client, the result is unmarshaled and returned to the caller of the method.

So far, RMI does not differ from traditional RPC systems. The main difference is that RMI allows clients to pass objects as parameters to a remote object's methods. As with normal method calls, the caller of the method may supply an instance of a subclass of the method's formal argument. This is an interesting feature, because in a distributed environment the callee (i.e., the server) may not have access to the subclass's code. In this situation, RMI will automatically fetch the subclass's code from the client. This way, client code is shipped to the server.

The details of using RMI are explained in the RMI documentation. Read this documentation carefully. Work your way through the example.

In the RMI part of this project, you must modify the storage server to use RMI instead of UDP. You should do this without modifying the application programming interface (API) in class il.ac.idc.storage.StorageServer. This interface includes a new method

public byte[] rexec(Credentials cred, ClientCode cc).

This method allows client code to be moved to the server. ClientCode.java is provided in the new storage server distribution. This class defines a single (abstract) method

public abstract byte[] run(OnServer server).

The role of OnServer will be explained shortly. Clients can define a subclass of ClientCode and can pass an instance of this subclass as a parameter to rexec(). This subclass must implement the run()method, which will contain client-specific code that can operate on storage units (see below). The server's RMI class loader will download the subclass's code when the implementation of rexec() invokes cc's run method. After the code has been downloaded, the client's run() method is invoked.

The client's run() method can operate on the data storage units managed by the server by invoking the following two methods on server:

byte[] readUnit(StorageID sid, long start, long len)
void writeUnit(StorageID sid, long start, long len, byte[] data)

Both methods are defined in the new interface OnServer. You must implement this interface.

Security

The ability to execute client code on the server introduces serious security concerns. Unless you take precautions, the client code can easily attack the server. For example, the client's code may be able to delete any file that the server has access to or it may hang up the server. Besides the security problems introduced by mobile code, the storage server suffers from several other, more basic problems.

Clients are not authenticated. Any process that knows how to talk to the storage server can create, read, write, or delete a storage unit.
The server does not perform access control. Even if the server knows to which client it is talking, there is nothing to prevent that client from accessing storage units created by other clients. The storage server does not keep track of which clients have access to which storage units.
Message integrity is not guaranteed. It is quite easy for a malicious process to modify messages transmitted by other processes.

To address these problems, you will use a variety of security techniques: secret-key cryptography, capabilities, code signing, and privileged code.

Secret-key cryptography

Secret-key cryptography allows two (or more) parties to communicate data in a confidential manner by encrypting and decrypting that data. Encryption and decryption are performed by means of an encryption algorithm that takes two inputs: the data to be encrypted or decrypted and a secret key:

cipher_text = encrypt(key, plain_text)

plain_text = decrypt(key, cipher_text)

The key is shared by all communicating parties. The encryption algorithm is usually known to the entire world, but the key should be kept secret. The main problem with secret-key cryptography is key distribution. The communicating parties need to agree on a secret key before they can communicate securely. To distribute this key in a secure manner to all parties, however, a secure channel is needed... In this project, we will not address this problem. In practice, it is usually solved by introducing a trusted third party or by using another type of cryptography, public-key cryptography, to distribute a key.

Secret-key cryptography can be used to preserve the confidentiality of messages that travel between a client and the storage server. We will assume, however, that confidentiality is not required, so you do not have to encrypt your messages. Instead, we will use secret-key cryptography for authentication purposes. Authentication is usually achieved by showing that you have or know something. In this project, you must arrange for every client to share a key with the server. A client can authenticate itself to the storage server by showing that it knows the secret key that it shares with the server. (Clearly, different clients must be given different keys.)

One way a client can show that it knows a key is by sending that key to the server. This a bad idea: any one listening in on the conversation can obtain the key. A better way is for the server to send a challenge (a random number) to the client and for the client to reply with the encrypted value of the challenge. The server can decrypt this reply and check that the decrypted value equals its challenge. This way, the client can show that it knows the secret key without revealing the key.

The problem with the scheme above is that it requires a three-way handshake between the client and the server. We do not want to perform such a handshake for each request issued by the client. A simpler scheme, one that uses only a single message, can be used if the client and server clocks are loosely synchronized. The main problem with this simpler scheme is that it is sensitive to replay attacks: another process can record the message sent by the client to the server. Later, this third process can replay the message to the server. If you implement this scheme, you must deal with this type of attack.

Capabilities

An access control mechanism allows one to specify who has access to what. One way to do this is to store an Access Control List (ACL) with each object. An ACL lists who has access to the resource that the ACL is associated with. Before a process is given access to the resource, the operating system checks if the user on whose behalf the process runs is listed in the ACL (explicitly or implicitly, as a member of some group). Windows NT and Solaris use ACLs to protect files.

In this project, we will use another mechanism, capabilities. A capability is like a ticket: if you have it, you get in, otherwise you are kept out. The difference between these software tickets and paper tickets is that copying is easy and legal. The owner of a capability is allowed to copy its capability and to hand the copy to another party. From then on, the receiver of the copy can access the resource too.

You are to modify the storage server so that it creates a capability for each storage unit that it creates. This capability must be returned to the creating process. When a process wishes to access a storage unit, it must present its capability for that storage unit to the storage server. Since other processes may listen in on the conversation, capabilities must be encrypted, or else they can be used by other processes. When the server receives a request to access a storage unit, it must check that the capability stored in that request gives access to the storage unit.

Capabilities are conveniently represented as (large) random numbers. If the set from which the random number is chosen is sufficiently large, then the probability that an adversary can guess a capability is negligible.

Executing client code safely

The server should not execute malicious client code. In particular, client code should be able to access only those storage units for which it can present a capability. SDK 2 provides a mechanism that assigns permissions to protection domains. When a Java virtual machine downloads a class, it creates a new protection domain and runs the downloaded code in that protection domain. The thread that executes the code in the new protection domain can do everything that is allowed by the permissions of the protection domain. When the thread enters another protection domain - e.g., when client code invokes a server routine - it executes with the intersection of the permissions of both domains.

A policy file specifies assigns permissions to downloaded code. Code that originates from different sources, called code bases, can be assigned different permissions. Permissions can also depend on who signed the code that is downloaded. You must write a sensible policy file for the virtual machine that executes the storage server code (and the downloaded client code). This policy file should give the server full access to the files it manages. Client code should run in a very restricted mode. In particular, a client should not be able to corrupt server data structures and should be given only restricted access to the files that contain storage units.

Code signing is based on public-key cryptography. With this type of cryptography, the sender and the recipient of a message need not share a secret key (as with secret-key cryptography). With public-key cryptography, each party generates a key pair, which consists of a private key and a public key. The public key is given to anyone who wants to have. You can publish your public key on your web page. The private key, in contrast, should not be given away.

Public-key cryptography can be used to encrypt data in the following way:

The sender of a message encrypts the message using the recipient's public key PUBKr (which is known to everyone):

cipher_text = encrypt(PUBKr, plain_text)

The receiver of the message decrypts the message using his or her private key PRIVKr:

plain_text = decrypt(PRIVKr, cipher_text)

Public-key cryptography can also be used to sign data. Here the goal is not to hide the data, but to prove that the data was sent by a particular person or organization. This is done by attaching a digital signature to the data. A sender can sign her message by encrypting that message with her private key; the result of that operation is the digital signature. Clearly, only the sender can produce the digital signature, because only the sender knows her private key. After creating the signature, the sender sends both her message and the signature to the receiver. The receiver verifies that the message was sent by the sender by decrypting the message using the sender's public key. The result should match the digital signature that the sender sent along with her message.

Summarizing, this is how signing works (in theory). (PRIVKs is the private key of the sender; PUBKs is the public key of the sender).

sender computes: signature = encrypt(PRIVKs, plain_text)
send to receiver: plaintext and signature
receiver computes: x = decrypt(PUBKs, ciphertext)
If x equals the plaintext, then the receiver knows the message was sent by the owner of PUBKs.

Public-key operations are expensive. In practice, the sender would not sign her entire message, but a small hash of that message. Computing the hash is not expensive and since the hash is small, signing it isn't very expensive either.

Your policy file must enforce that all client code be signed by known parties. You should at least be able to execute code signed by the "CS415 tester". To be able to verify the digital signature of the CS415 tester, your server needs to know the public key that corresponds to the private key that the CS415 tester uses to sign his code.This public key is encapsulated in this (self-signed) certificate . The human-readable version of this certificate is as follows:

Owner: CN=CS415 tester, OU=Dept. of Computer Science, O=Cornell University, L=Ithaca, ST=New York, C=NY
Issuer: CN=CS415 tester, OU=Dept. of Computer Science, O=Cornell University, L=Ithaca, ST=New York, C=NY
Serial number: 382d84ca
Valid from: Sat Nov 13 10:33:30 EST 1999 until: Fri Feb 11 10:33:30 EST 2000
Certificate fingerprints:
MD5: 69:A8:B8:7C:04:75:D1:4C:06:7C:13:11:D6:F6:69:37
SHA1: 28:CE:37:F9:E8:E3:84:95:7D:65:E7:2F:A2:E8:74:8F:A2:34:76:F3

You should import the certificate into your server's key store, a database that contains the certificates of parties known to the server.

A useful mechanism is the doPrivileged() method in class java.security.AccessController(). This mechanism extends the set of permissions available to a thread to the permissions associated with the current protection domain. This allows you to write small pieces of trusted code. For example, you may not want to give client code access to arbitrary files. Instead, you can write some trusted code that accesses particular files only after it has made some security checks. This way, clients can only gain extra privileges in a controlled manner. You should compare this mechanism to Unix's setuid mechanism.

Securing the storage server

You must secure the storage server by adding capabilities and by making sure that client code that runs on the server cannot do great damage to the server. To do this, you should

Arrange for each client to share a secret key with the server. Assume that each user is identified by a unique number. The client must store its key in a file name keys.client.n, which must be stored in the user's home directory; n is the client's number. So file keys.client.34 would contain the keys for client 34. This file contains one line with the client's secret key (a number). The server stores the keys it shares with clients in a file keys.server, which is stored in the server's home directory. This file contains one line per client. Each line contains the client's number, a space, and the secret key that the client shares with the server.
Assume that when a process creates a Credentials object, it will pass the user's id as an argument to the constructor. ("The user" is the user on whose behalf the process is running.)
Make sure that the server will only handle requests from known clients (clients listed in keys.server).
Make sure that client code downloaded onto the server can access those storage units that the code can present capabilities for, but no others.
Make sure that client code downloaded onto the server cannot corrupt or kill the server by writing to files or variables without appropriate permissions, cannot leak information to the client that the client should not have access to, etc. Write a sensible policy file for the storage server and store it in the top-level directory of your storage server code. The file must be called policy.server. Your policy file should grant unrestricted to the server itself, but should give restricted permissions to client code. In particular, you must enforce that all client code be signed. The public keys of known parties must be stored in file keystore.server, which must be stored in the top-level directory of you storage server code. You must protect you key store with password kscs415.

You should not change the application programming interface to the storage server.

Writeup

Make sure your writeup includes at least the following components:

A brief overview. From this overview, I should be able to grasp which mechanisms and algorithms you used.
Who did what?
A description of your authentication protocol
A description of your implementation of capabilities
A description of your code download policy
A description of how you prevent client code from corrupting the server
A description of the attacks your system protects against.

Software to use

Unfortunately, visual J++ does not include the JDK 1.2 classes which provide some of the security mechanisms that you need to complete this project. You will therefore have to use Sun's SDK 2 Java implementation. Documentation for this implementation (compiler, etc.) can be found here. You will have to use the following tools:

javac, the Java compiler
java, the Java interpreter
rmic, the RMI stub compiler
rmiregistry, the RMI name server
keytool, a tool to manage key stores
jarsigner, a tool to sign the code stored in a Java archive

Use the following storage server code: storage-3-2.zip. This is the bare-bones storage server without the disk simulator, without popup threads, and without caching. This version includes:

A new version of interface StorageServer. This new interface adds the rexec() method.
Class il.ac.idc.storage.CryptKey. This class allows you to create secret keys and to encrypt and decrypt data by means of those keys.
Class il.ac.idc.storage.ClientCode. Clients can create subclasses of this class and pass instances of this subclass to rexec().
Interface OnServer. Clients code that gets shipped to the server uses this interface to access data storage units.

Tips

I suggest you implement this project in the following order; get one piece to work before you start on the next one:
- Modify the storage server to use RMI instead of UDP.
- Implement authentication by means of secret keys.
- Implement rexec() and capabilities.
- Implement support for signed code.

Handing in the software

The code for this project is due 3 December, at midnight. Store your folder with the storage server code in \\goose.csuglab.cornell.edu\courses\cs415-fall99\submit-4.

Resources

RMI documentation:

Read the "Getting Started" tutorial, the RMI specification, and the documentation for rmic and rmiregistry.

Java security documentation

You don't have to read everything on this page to get started. You will need to know a little bit about the Java security architecture. In addition, the security trail of the Java tutorial is useful, in particular the sections about code signing.

Tool documentation (javac, java, rmic, rmiregistry, keytool, jarsigner)

Silberschatz and Galvin: Chapters 16, 19, and 20.