RMI and security

Due date: Friday, December 3rd

Note: you can download the required version of the storage server code from here:
storage-3-2.zip

What you need to do

In this project, you will add three features to the storage server

  1. the use of Java's Remote Method Invocation (RMI) mechanism to implement client-server communication

  2. the ability to execute client code on the storage server

  3. security

Remote method invocations automate much of the tedious work that has now been performed manually in the storage server. RMI also provides the necessary mechanism to execute client code on the storage server. The idea behind moving client computations to the storage server is that it is sometimes less expensive to move a computation to storage units than it is to move the strorage units to the client. Security has been a missing feature in all previous storage server implementations; allowing client code to execute on the server only increases the need for security mechanisms.

RMI

The current implementation of the storage server uses a very simple Remote Procedure Call (RPC) implementation. The implementation is simple for three reasons:

  1. The implementation includes hand-written code to convert objects that must travel to the server (and back) to an array of bytes that can be transported across the network. Converting data objects into a flat byte stream is called marshaling; the reverse process is called unmarshaling. Full-fledged RPC systems provide a stub compiler that can generate marshaling and unmarshaling code from an interface description.

  2. The network address of the server is more or less hardwired into the client code, which is undesirable. Normally, the server address is obtained from a name server through a process called binding.

  3. The storage server relies on the User Datagram Protocol (UDP) to deliver request and reply messages. UDP, however, is an unreliable protocol. RPC systems deliver messages reliably.

Java's RMI mechanism is an object-oriented flavor of RPC. The RMI distribution includes a stub compiler, rmic,which you can use to generate marshaling and unmarshaling code. In RMI's terminology, marshaling is called serialization and unmarshaling is called deserialization. Binding takes place by means of a server called the RMI registry. Internally, RMI uses TCP/IP to transport messages reliably.

Briefly, RMI works as follows. The implementor of the server writes a Java interface which defines the methods that the server exports to its clients. The implementor also provides a class that implements this interface. Finally, the implementor generates marshaling and unmarshaling code using the stub compiler. The service can now be made available by creating an instance of the server class and by registering this instance with the RMI registry under a well-known name. (This is done using Naming.rebind().)

Clients can query the RMI registry for the name of the service (using Naming.lookup()). If the registry knows the service, it will return to the client an object that implements the service interface. Such an instance is, quite appropriately, called a remote object. The client can invoke the methods defined in the server interface on the remote object. The method's actual parameters will automatically be marshaled into a message and transmitted to the server. At the server, a (popup!) thread is created that unmarshals the message and invokes the server's implementation of the method. After the method has executed, the return value is marshaled into a reply message which is sent back to the client. At the client, the result is unmarshaled and returned to the caller of the method.

So far, RMI does not differ from traditional RPC systems. The main difference is that RMI allows clients to pass objects as parameters to a remote object's methods. As with normal method calls, the caller of the method may supply an instance of a subclass of the method's formal argument. This is an interesting feature, because in a distributed environment the callee (i.e., the server) may not have access to the subclass's code. In this situation, RMI will automatically fetch the subclass's code from the client. This way, client code is shipped to the server.

The details of using RMI are explained in the RMI documentation. Read this documentation carefully. Work your way through the example.

In the RMI part of this project, you must modify the storage server to use RMI instead of UDP. You should do this without modifying the application programming interface (API) in class il.ac.idc.storage.StorageServer. This interface includes a new method

public byte[] rexec(Credentials cred, ClientCode cc).

This method allows client code to be moved to the server. ClientCode.java is provided in the new storage server distribution. This class defines a single (abstract) method

public abstract byte[] run(OnServer server).

The role of OnServer will be explained shortly. Clients can define a subclass of ClientCode and can pass an instance of this subclass as a parameter to rexec(). This subclass must implement the run()method, which will contain client-specific code that can operate on storage units (see below). The server's RMI class loader will download the subclass's code when the implementation of rexec() invokes cc's run method. After the code has been downloaded, the client's run() method is invoked.

The client's run() method can operate on the data storage units managed by the server by invoking the following two methods on server:

Both methods are defined in the new interface OnServer. You must implement this interface.

Security

The ability to execute client code on the server introduces serious security concerns. Unless you take precautions, the client code can easily attack the server. For example, the client's code may be able to delete any file that the server has access to or it may hang up the server. Besides the security problems introduced by mobile code, the storage server suffers from several other, more basic problems.

To address these problems, you will use a variety of security techniques: secret-key cryptography, capabilities, code signing, and privileged code.

Secret-key cryptography

Secret-key cryptography allows two (or more) parties to communicate data in a confidential manner by encrypting and decrypting that data. Encryption and decryption are performed by means of an encryption algorithm that takes two inputs: the data to be encrypted or decrypted and a secret key:

cipher_text = encrypt(key, plain_text)

plain_text = decrypt(key, cipher_text)

The key is shared by all communicating parties. The encryption algorithm is usually known to the entire world, but the key should be kept secret. The main problem with secret-key cryptography is key distribution. The communicating parties need to agree on a secret key before they can communicate securely. To distribute this key in a secure manner to all parties, however, a secure channel is needed... In this project, we will not address this problem. In practice, it is usually solved by introducing a trusted third party or by using another type of cryptography, public-key cryptography, to distribute a key.

Secret-key cryptography can be used to preserve the confidentiality of messages that travel between a client and the storage server. We will assume, however, that confidentiality is not required, so you do not have to encrypt your messages. Instead, we will use secret-key cryptography for authentication purposes. Authentication is usually achieved by showing that you have or know something. In this project, you must arrange for every client to share a key with the server. A client can authenticate itself to the storage server by showing that it knows the secret key that it shares with the server. (Clearly, different clients must be given different keys.)

One way a client can show that it knows a key is by sending that key to the server. This a bad idea: any one listening in on the conversation can obtain the key. A better way is for the server to send a challenge (a random number) to the client and for the client to reply with the encrypted value of the challenge. The server can decrypt this reply and check that the decrypted value equals its challenge. This way, the client can show that it knows the secret key without revealing the key.

The problem with the scheme above is that it requires a three-way handshake between the client and the server. We do not want to perform such a handshake for each request issued by the client. A simpler scheme, one that uses only a single message, can be used if the client and server clocks are loosely synchronized. The main problem with this simpler scheme is that it is sensitive to replay attacks: another process can record the message sent by the client to the server. Later, this third process can replay the message to the server. If you implement this scheme, you must deal with this type of attack.

Capabilities

An access control mechanism allows one to specify who has access to what. One way to do this is to store an Access Control List (ACL) with each object. An ACL lists who has access to the resource that the ACL is associated with. Before a process is given access to the resource, the operating system checks if the user on whose behalf the process runs is listed in the ACL (explicitly or implicitly, as a member of some group). Windows NT and Solaris use ACLs to protect files.

In this project, we will use another mechanism, capabilities. A capability is like a ticket: if you have it, you get in, otherwise you are kept out. The difference between these software tickets and paper tickets is that copying is easy and legal. The owner of a capability is allowed to copy its capability and to hand the copy to another party. From then on, the receiver of the copy can access the resource too.

You are to modify the storage server so that it creates a capability for each storage unit that it creates. This capability must be returned to the creating process. When a process wishes to access a storage unit, it must present its capability for that storage unit to the storage server. Since other processes may listen in on the conversation, capabilities must be encrypted, or else they can be used by other processes. When the server receives a request to access a storage unit, it must check that the capability stored in that request gives access to the storage unit.

Capabilities are conveniently represented as (large) random numbers. If the set from which the random number is chosen is sufficiently large, then the probability that an adversary can guess a capability is negligible.

Executing client code safely

The server should not execute malicious client code. In particular, client code should be able to access only those storage units for which it can present a capability. SDK 2 provides a mechanism that assigns permissions to protection domains. When a Java virtual machine downloads a class, it creates a new protection domain and runs the downloaded code in that protection domain. The thread that executes the code in the new protection domain can do everything that is allowed by the permissions of the protection domain. When the thread enters another protection domain - e.g., when client code invokes a server routine - it executes with the intersection of the permissions of both domains.

A policy file specifies assigns permissions to downloaded code. Code that originates from different sources, called code bases, can be assigned different permissions. Permissions can also depend on who signed the code that is downloaded. You must write a sensible policy file for the virtual machine that executes the storage server code (and the downloaded client code). This policy file should give the server full access to the files it manages. Client code should run in a very restricted mode. In particular, a client should not be able to corrupt server data structures and should be given only restricted access to the files that contain storage units.

Code signing is based on public-key cryptography. With this type of cryptography, the sender and the recipient of a message need not share a secret key (as with secret-key cryptography). With public-key cryptography, each party generates a key pair, which consists of a private key and a public key. The public key is given to anyone who wants to have. You can publish your public key on your web page. The private key, in contrast, should not be given away.

Public-key cryptography can be used to encrypt data in the following way:

Public-key cryptography can also be used to sign data. Here the goal is not to hide the data, but to prove that the data was sent by a particular person or organization. This is done by attaching a digital signature to the data. A sender can sign her message by encrypting that message with her private key; the result of that operation is the digital signature. Clearly, only the sender can produce the digital signature, because only the sender knows her private key. After creating the signature, the sender sends both her message and the signature to the receiver. The receiver verifies that the message was sent by the sender by decrypting the message using the sender's public key. The result should match the digital signature that the sender sent along with her message.

Summarizing, this is how signing works (in theory). (PRIVKs is the private key of the sender; PUBKs is the public key of the sender).

Public-key operations are expensive. In practice, the sender would not sign her entire message, but a small hash of that message. Computing the hash is not expensive and since the hash is small, signing it isn't very expensive either.

Your policy file must enforce that all client code be signed by known parties. You should at least be able to execute code signed by the "CS415 tester". To be able to verify the digital signature of the CS415 tester, your server needs to know the public key that corresponds to the private key that the CS415 tester uses to sign his code.This public key is encapsulated in this (self-signed) certificate. The human-readable version of this certificate is as follows:

Owner: CN=CS415 tester, OU=Dept. of Computer Science, O=Cornell University, L=Ithaca, ST=New York, C=NY
Issuer: CN=CS415 tester, OU=Dept. of Computer Science, O=Cornell University, L=Ithaca, ST=New York, C=NY
Serial number: 382d84ca
Valid from: Sat Nov 13 10:33:30 EST 1999 until: Fri Feb 11 10:33:30 EST 2000
Certificate fingerprints:
MD5: 69:A8:B8:7C:04:75:D1:4C:06:7C:13:11:D6:F6:69:37
SHA1: 28:CE:37:F9:E8:E3:84:95:7D:65:E7:2F:A2:E8:74:8F:A2:34:76:F3

You should import the certificate into your server's key store, a database that contains the certificates of parties known to the server.

Securing the storage server

You must secure the storage server by adding capabilities and by making sure that client code that runs on the server cannot do great damage to the server. To do this, you should

You should not change the application programming interface to the storage server.

Writeup

Make sure your writeup includes at least the following components:

Software to use

Unfortunately, visual J++ does not include the JDK 1.2 classes which provide some of the security mechanisms that you need to complete this project. You will therefore have to use Sun's SDK 2 Java implementation. Documentation for this implementation (compiler, etc.) can be found here. You will have to use the following tools:

Use the following storage server code: storage-3-2.zip. This is the bare-bones storage server without the disk simulator, without popup threads, and without caching. This version includes:



Tips

Handing in the software

The code for this project is due 3 December, at midnight. Store your folder with the storage server code in \\goose.csuglab.cornell.edu\courses\cs415-fall99\submit-4.

Resources

RMI documentation:

Read the "Getting Started" tutorial, the RMI specification, and the documentation for rmic and rmiregistry.

Java security documentation

You don't have to read everything on this page to get started. You will need to know a little bit about the Java security architecture. In addition, the security trail of the Java tutorial is useful, in particular the sections about code signing.

Tool documentation (javac, java, rmic, rmiregistry, keytool, jarsigner)

Silberschatz and Galvin: Chapters 16, 19, and 20.