Puzzle 1: Socket programming

Goal

Your goal is to write a program that uses a simple protocol to communicate with WWW clients and servers in the Internet. As you solve the puzzle, you will learn to use TCP/IP and sockets for communicating over the Internet. You will learn that socket programming is not hard, but it takes some discipline and requires you to completely understand each of these few functions you need to use.

Outline

The program you are supposed to write is a web-proxy - a program you can put between a web-browser and web-servers. Usually the purpose of such proxy is to create a more secure system - get through firewalls, etc. - or to increase efficiency - by caching popular pages. The proxy you write is a dummy proxy in the sense that it does only forwards requests to web-servers and replies back to the browser.

Realize that the proxy serves both as client and server - it is a server to the web-browser and a client to the web-server. We strongly recommend that when you write your program you do it in the following three steps: First you write a simple server that upon a request from a web-browser returns a HTML page (with header information), then a simple web-client that connects to a web-server, and finally combines the client and the server together to make the proxy. Below you'll find some hints and pitfalls for each of the component you have to implement.

The Server

A server is a program that waits on some port for incoming calls (connections). Upon a connection the server handles the request before accepting another client. Note that this is not usually done this way - it is not hard to make the server able to handle multiple-connections simultaniously. Either this is done by duplicating the server process (by using fork) or by using threads. But this usually introduces lots of other problems (like preventing zombies) and in this assignment your server should only handle one connection at a time. Do not use fork or threads. Instead you should have the size of the queue of waiting clients large (like 20). This is a parameter to the listen system call.

Server outline:

create a socket
bind it to local address
listen
do forever
    accept a new call (gives you a new socket)
    ...
    close the accepted socket
loop
close the listening socket (if you have some way of breaking the loop)

System calls

To implement a server the following system calls are needed: socket, bind, listen, accept, read, write, close. Please look at the man pages for these functions for further information. Be sure to link in libxnet, libsocket and libnsl, as described in the man pages. For more information, do man -s 4 libxnet etc.

Hints

Always check the return values of the system calls you are using. You can use the assert function to make this easier.

The server uses two sockets. One which the server established and is used to listen for incoming calls. The other one is created when accept returns (on new connection). When you have served on connection you should close the accepted socket.

Different machines have different byte-ordering - big- or little-endian. Machines with different byte ordering need some way of handling this. This is done by defining the network as big-endian. To make life easier for programmers the socket library has functions to convert from the host byte-ordering to the network byte-ordering, and back: htonl (for long integers), ntohl, htons (short integers), and ntohs. On big-endian machines those functions are actually void functions.
Example of use:
    serverAddress.sin_port = htons(port);

You should zero fill structures before setting their values - they usually have more fields than you need for the server.
Example:
    bzero( (char*)&serverAddress, sizeof(serverAddress) );

You can try your server by using Netscape or IE. If, for example, your server is running on a host called foo and is listening to port 20000 then type: http://foo:20000/ in Netscape or IE.

You need to start your answer to the client with the two lines given below (leave no spaces at left) followed by a blank line. Only what you send after these will appear in the browser window.
    HTTP/1.0 200 OK
    Content-type: text/html

The Client

Write a simple client that takes one argument (hostname) and connects to that machine. First you must lookup the IP-address of the server. Then you create a socket and call connect to establish a connection. The you can simply read from and write to the socket.

Client outline:

create a socket
find the IP-address of the server
connect to the server
...
close the socket
quit

System calls

To implement a client the following system calls are needed: socket, connect, read, write, close. For the address lookup you can use gethostbyname. Please look at the man pages for these functions for further information.

Hints

One hostname may have multiple IP-addresses. The gethostbyname returns an array of address fields. Just use the first address in the array.

HTTP daemons usually are on port 80. In this assignment always use port 80 to connect to web-servers.

The web server you connect to will not reply except you send a request. Take a look at the requests Netscape/IE sends for examples (you can print them out in your server).

Calling write to write to a socket sometimes cannot complete writing all the buffer in one shot (e.g. because of full buffers). It returns the number of bytes written. To make sure all your data is written you may have to call write again (in a loop) to write the remaining data. You should write a function to handle this: write_n_bytes.

The Proxy

In the final step you merge the client and the server. By default read is a blocking function. Since the proxy is handling two connections - one to the web-browser (client) and one to a web-server - it cannot simply do two blocking reads at the same time (this is of course possible if using threads - but then we introduce other problems such as locking). In this assignment you should make the socket non-blocking and use the select system call to decide when (and where) to read. The select system call is a blocking function that you can have waiting on more than one socket. When one (or more) of the sockets gets data the select returns. Then you check which socket is getting data and you read that.

select function usage:

make the sockets non-blocking
initialize the select file descriptor (FD) set
while (select(...) > 0 )
if socket 1 has message
    read until no more data
    if error and errno is not EWOULDBLOCK
      then break
if socket 2 has message (Note: not else-if)
    read until no more data
    if error and errno is not EWOULDBLOCK
      then break
...
re-initialize the select set (important)
loop

System calls

Use fcntl to set the sockets as non-blocking. Use FD_ZERO, FD_SET and FD_ISSET for the selection set for select. Please look at the man pages for these functions for further information.

Hints

Do not make the sockets non-blocking before you have connected successfully (it is simpler to have the connect/accept functions blocking).

You can make a socket s non-blocking by calling:
    fcntl( s, F_SETFL, FNDELAY );

The select function has 5 parameters. The first one is number of possible sockets. Either choose some high number (10) or pick one number higher than the highest current socket number. The next 3 parameters are to tell the select function in what information you are interested. We are only interested in when new data is arriving so only use the first one (for read). The last parameter is timeout. Set the timeout to 20 seconds.

You must check for each socket if there is new data (by using FD_ISSET). Do not do if data for socket 1 then ... else if data for socket 2 then .... If you do that, then you will deal with data from only one socket per select call, instead of dealing with all available data.

The select only tells you about new data. You should read all the data available from a socket. When there is no more data the read will return -1 and errno is EWOULDBLOCK (meaning if the function were blocking it would block). On any other error you should exit the loop. Upon "end-of-file" read returns 0.

Select changes the read FD set (which you use to tell select in what sockets you are interested in). Therefore you must reinitialize it every time before calling select again.

The Proxy interface

Your proxy should take one parameter - which port it is listening to. It will be invoked like this:

proxy <port>

How to test

You can set which proxy to use both in Netscape and IE. Then you should test against some popular web-servers. Note, that in Netscape (and probably IE as well) requests that goes through proxies are slightly different than when connecting directly. Some server do not accept proxy requests (like http://www.cnn.com/). Since your proxy server is dummy (does not change the request) you may have problems connecting to some server - but that is fine.

What to submit

Please first read the overall submission instructions. In addition, please follow these instructionss:

You are supposed to do this assignment in C++ on a UNIX machine. You should only return the source code, called proxy.cpp. You should use the g++ to compile your code (at least we will while grading).

Mail the source file to snorri@cs.cornell.edu as plain ASCII text (not attachments).
The subject line must be: CS519: Homework 1

Grading

Your proxy will be tested by running your proxy and connect to it using Netscape or IE against some selected web-servers (the same servers for everybody of course). If the pages are fetched successfully you get full grade. If it fails you can come to the TA's office hour and explain what went wrong. If the problem is minor then you'll get 80% otherwise 0.

General Hints

Probably the best introduction to socket hacking is at http://world.std.com/~jimf/papers/sockets/sockets.html.
The Richard Stevens's book: UNIX Network Programming is another excellent guide.
Remember to convert from host byte order to network byte order using the htons, htonl, ntohs, and ntohl calls, particularly when filling up the fields in the sockaddr_in structure.
Remember to error check each system call. Use perror to notify the user of errors.
Let the proxy print out to the stdout status messages, like: got connection, created socket successfully, etc. It is fine to leave it in the final version.
When a server binds to a port number, the number cannot be used for a while, even if the server terminates (this is called lingering). To work around this, every time you start the server, remember to use a port number different from one you have used in the recent past. You can see which port numbers are currently in use with the /usr/etc/netstat command.

This page was last updated on 01/01/02 06:40 PM