Lecture 24: sockets and DNS

socket API
- server and client sockets; create, bind, listen, accept, connect, send, recv, close
DNS
- domain name, authoritative nameserver, caching nameserver, root nameserver

Sockets

Sockets are the unix (system call) API for interacting with TCP streams. They are available as an API in many languages.

There are two kinds of sockets:

a server socket accepts incoming connections on a well known port
a client socket is used to send and receive data after a connection is established.

We did some demos in python. To create a server, one creates a socket and binds it to the well-known port:

python
>>> from socket import *
>>> server = socket()
>>> print(server.bind.__doc__)
bind(address)

Bind the socket to a local address.  For IP sockets, the address is a
pair (host, port); the host must refer to the local host. For raw packet
sockets the address is a tuple (ifname, proto [,pkttype [,hatype]])
>>> server.bind(('localhost', 44100))

Note that the 'localhost' parameter indicates which IP address on the local host to bind to.

At this point, the port has been reserved, but isn't yet accepting connections:

shell$ telnet localhost 44100

python
>>> s2 = socket()
>>> s2.bind(('localhost',44100))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 98] Address already in use

To start accepting connections, we must call listen:

>>> print(server.listen.__doc__)
listen([backlog])

Enable a server to accept connections.  If backlog is specified, it must be
at least 0 (if it is lower, it is set to 0); it specifies the number of
unaccepted connections that the system will allow before refusing new
connections. If not specified, a default reasonable value is chosen.
>>> server.listen()

shell$ telnet localhost 44100
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

At this point, the 3-way TCP handshake between the telnet client and the python server has been performed. Notice that the client has been given an ephemeral port number:

shell$ netstat -t
tcp        0      0 localhost:50822         localhost:44100         ESTABLISHED
tcp        0      0 localhost:44100         localhost:50822         ESTABLISHED

We are using telnet as the client; if we were to write the client in python (or look at the source code of telnet) we would create a client socket and call connect, providing the remote host and port:

>>> s = socket()
>>> print(s.connect.__doc__)
connect(address)

Connect the socket to a remote address.  For IP sockets, the address
is a pair (host, port).
>>> s.connect(('localhost',4410))

Calling connect makes the socket a client socket; had we called bind and listen it would be a server socket.

Returning to the server, we want to get a client socket representing our telnet client's connection. We wait for a connection by calling accept:

>>> print(server.accept.__doc__)
accept() -> (socket object, address info)

    Wait for an incoming connection.  Return a new socket
    representing the connection, and the address of the client.
    For IP sockets, the address info is a pair (hostaddr, port).
    
>>> sock,addr = server.accept()
>>> sock
<socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 4410), raddr=('127.0.0.1', 50822)>
>>> addr
('127.0.0.1', 50822)

Accept will block until a new client connects; when it does, it creates and returns a client socket. In this case, we had already connected, so accept returns immediately. If we called accept again, it would block until a second client connected.

At this point, we can call send and recv on sock to send data to the client and receive data from the client.

>>> print(sock.send.__doc__)
send(data[, flags]) -> count

Send a data string to the socket.  For the optional flags
argument, see the Unix manual.  Return the number of bytes
sent; this may be less than len(data) if the network is busy.
>>> print(sock.recv.__doc__)
recv(buffersize[, flags]) -> data

Receive up to buffersize bytes from the socket.  For the optional flags
argument, see the Unix manual.  When no data is available, block until
at least one byte is available or until the remote end is closed.  When
the remote end is closed and all data is read, return the empty string.
>>> sock.send(bytes('hello\n','utf-8'))
6
[in telnet window we see 'hello', and type 'sup?<enter>]
>>> sock.recv(3)
b'sup'
>>> sock.recv(5)
b'?\r\n'

Note that recv does not wait until all requested bytes are available, instead, it blocks until there are any bytes available, and then returns however many bytes are available (to a maximum of the request).

Eventually, when we are done sending data, we can call close; this sends the FIN packet, and then cleans up the send buffer when the FIN has been acknowleged. Note that you can still receive data after calling close, but you cannot send more data.

If we were writing a real server, we would want to simultaneously wait for new connections and wait for data from the connections we have established. A very common pattern is to have a server loop that continuously calls accept and then forks off a new thread to handle each connection.

DNS

We discussed DNS (the domain name service), an important application-layer protocol for converting human-readable domain names (such as cs.cornell.edu) to IP addresses. Domain names are hierarchical: the cs.cornell.edu domain is contained in the cornell.edu domain which is in turn a part of the .edu domain.

To resolve a domain name (e.g. www.cs.cornell.edu), you first query your local caching name server. In principle, your name server will first query one of the (small number of well-known) root name servers to find the .edu authoritative name server. This is the server that is responsible for storing the IP addresses of all name servers of subdomains of the .edu domain.

The .edu nameserver will respond with the IP address of the cornell.edu authoritative nameserver. In turn, this will respond with the IP address of the cs.cornell.edu nameserver. Finally, your caching nameserver will query the cs.cornell.edu nameserver to find the IP of the www host.

The caching nameserver may cache any of the addresses that it learns as part of this process. Once it has resolved the name, the address will be returned to the requesting application.