Puzzle 1: Socket programming

Goal

Your goal is to write a program that uses a simple protocol to communicate with WWW clients and servers in the Internet. As you solve the puzzle, you will learn to use TCP/IP and sockets for communicating over the Internet. You will learn that socket programming is not hard, but it takes some discipline and requires you to completely understand each of these few functions you need to use.

Outline

The program you are supposed to write is a web-proxy - a program you can put between a web-browser and web-servers. Usually the purpose of such proxy is to create a more secure system - get through firewalls, etc. - or to increase efficiency - by caching popular pages. The proxy you write is a dummy proxy in the sense that it does only forwards requests to web-servers and replies back to the browser.

Realize that the proxy serves both as client and server - it is a server to the web-browser and a client to the web-server. We strongly recommend that when you write your program you do it in the following three steps: First you write a simple server that upon a request from a web-browser returns a HTML page (with header information), then a simple web-client that connects to a web-server, and finally combines the client and the server together to make the proxy. Below you'll find some hints and pitfalls for each of the component you have to implement.

The Server

A server is a program that waits on some port for incoming calls (connections). Upon a connection the server handles the request before accepting another client. Note that this is not usually done this way - it is not hard to make the server able to handle multiple-connections simultaniously.  Either this is done by duplicating the server process (by using fork) or by using threads. But this usually introduces lots of other problems (like preventing zombies) and in this assignment your server should only handle one connection at a time. Do not use fork or threads. Instead you should have the size of the queue of waiting clients large (like 20). This is a parameter to the listen system call.

Server outline:

create a socket
bind it to local address
listen
do forever
    accept a new call (gives you a new socket)
    ...
    close the accepted socket

loop
close the listening socket (if you have some way of breaking the loop)

System calls

To implement a server the following system calls are needed: socket, bind, listen, accept, read, write, close. Please look at the man pages for these functions for further information. Be sure to link in libxnet, libsocket and libnsl, as described in the man pages. For more information, do man -s 4 libxnet etc.

Hints

The Client

Write a simple client that takes one argument (hostname) and connects to that machine. First you must lookup the IP-address of the server. Then you create a socket and call connect to establish a connection. The you can simply read from and write to the socket.

Client outline:

create a socket
find the IP-address of the server
connect to the server
...

close the socket
quit

System calls

To implement a client the following system calls are needed: socket, connect, read, write, close. For the address lookup you can use gethostbyname.  Please look at the man pages for these functions for further information.

Hints

The Proxy

In the final step you merge the client and the server. By default read is a blocking function. Since the proxy is handling two connections - one to the web-browser (client) and one to a web-server - it cannot simply do two blocking reads at the same time (this is of course possible if using threads - but then we introduce other problems such as locking). In this assignment you should make the socket non-blocking and use the select system call to decide when (and where) to read. The select system call is a blocking function that you can have waiting on more than one socket. When one (or more) of the sockets gets data the select returns. Then you check which socket is getting data and you read that.

select function usage:

make the sockets non-blocking
initialize the select file descriptor (FD) set
while (select(...) > 0 )

  if socket 1 has message
    read until no more data
    if error and errno is not EWOULDBLOCK
      then break
  if socket 2 has message (Note: not else-if)
    read until no more data
    if error and errno is not EWOULDBLOCK
      then break
  ...
  re-initialize the select set (important)
loop

System calls

Use fcntl to set the sockets as non-blocking. Use FD_ZERO, FD_SET and FD_ISSET for the selection set for select. Please look at the man pages for these functions for further information.

Hints

The Proxy interface

Your proxy should take one parameter - which port it is listening to. It will be invoked like this:

        proxy <port>

How to test

You can set which proxy to use both in Netscape and IE. Then you should test against some popular web-servers. Note, that in Netscape (and probably IE as well) requests that goes through proxies are slightly different than when connecting directly. Some server do not accept proxy requests (like http://www.cnn.com/). Since your proxy server is dummy (does not change the request) you may have problems connecting to some server - but that is fine. 

What to submit

Please  first read the overall submission instructions. In addition, please follow these instructionss:

You are supposed to do this assignment in C++ on a UNIX machine. You should only return the source code, called proxy.cpp. You should use the g++ to compile your code (at least we will while grading).

Mail the source file to snorri@cs.cornell.edu as plain ASCII text (not attachments).
The subject line must be: CS519: Homework 1

Grading

Your proxy will be tested by running your proxy and connect to it using Netscape or IE against some selected web-servers (the same servers for everybody of course). If the pages are fetched successfully you get full grade. If it fails you can come to the TA's office hour and explain what went wrong. If the problem is minor then you'll get 80% otherwise 0.

General Hints

  1. Probably the best introduction to socket hacking is at http://world.std.com/~jimf/papers/sockets/sockets.html.
  2. The Richard Stevens's book: UNIX Network Programming is another excellent guide.
  3. Remember to convert from host byte order to network byte order using the htons, htonl, ntohs, and ntohl calls, particularly when filling up the fields in the sockaddr_in structure.
  4. Remember to error check each system call. Use perror to notify the user of errors.
  5. Let the proxy print out to the stdout status messages, like: got connection, created socket successfully, etc. It is fine to leave it in the final version.
  6. When a server binds to a port number, the number cannot be used for a while, even if the server terminates (this is called lingering). To work around this, every time you start the server, remember to use a port number different from one you have used in the recent past. You can see which port numbers are currently in use with the /usr/etc/netstat command.

This page was last updated on  01/01/02 06:40 PM