Transport layer protocols sit atop the network layer and provide additional services. UDP is a very thin wrapper; it only provides multiplexing. TCP uses retransmission to provide reliable in-order delivery of a stream of data.
IP provides a facility for delivering packets to hosts. However, we often want to deliver data to applications running on a given host. Moreover, the receivers may wish to deliver responses; those responses must be returned to the actual process that sent the request.
TCP and UDP both enable multiplexing by including a port number in their respective headers. To receive packets, applications can bind to a given port on their local host; the operating system will only deliver packets that are addressed to that port. For example, a web server may bind to port 80; while a mail server may bind to port 993.
When creating an outgoing connection, TCP and UDP both create a new port on the client machine so that the server can send back responses. These ports are sometimes referred to as anonymous ports.
The Uniform Datagram Protocol (UDP) is a thin wrapper around IP that supports multiplexing. A UDP header contains a source and destination port number, the length of the data, and a checksum.
Datagrams are self-contained messages of a fixed size. Think of them as postcards.
Datagrams are convenient when sending small, self-contained messages whose size is known in advance. However many applications require interaction that is more akin to a conversation: each side may send some data, wait for a response, and then send some more data, or may send a very large stream of continuous data. Streams are open-ended communication channels.
The Transmission Control Protocol (TCP) provides a stream-oriented interface. Applications wishing to communicate using TCP first establish a connection. Then, over time, they can append data to the stream that they send to a remote endpoint. The remote endpoint can repeatedly read data from the stream. TCP guarantees that the data read by the remote endpoint is the same as the data written by the source.
TCP streams are bidirectional; once a connection is established from a client to a server, both parties can read and write data; the data written by the client will be read by the server and vice-versa.
TCP provides reliable delivery by requiring an acknowlegement from the remote endpoint that each packet was received. If a TCP packet is sent and not acknowleged before a certain amount of time, then the sender will resend the packet. It will continue to do so until the receipt is acknowleged.
TCP communications are divided into segments, each of which is identified by a sequence number; the first packet from host A to B might have sequence number 0, the second one 1, and so on.
Because TCP endpoints must maintain state (which packets have been sent, which have been acknowleged and so on), TCP requires a connection to be established. To begin a connection, the endpoints perform a 3-way handshake: the initiator sends a synthesis (SYN) packets. The receiver responds with an acknowlegement (ACK) of the SYN and its own SYN packet. These are usually combined into a single SYN/ACK packet. The initiator then acknowleges the sender's SYN with it's own ACK.
|A -> B||A0||SYN|
|B -> A||B0||SYN/ACK(A0)|
|A -> B||A1||ACK(B0)|
At this point the connection is established; both sides know that the other side is ready to receive packets. At that point, either side can send messages:
|A -> B||A2||Data ("hello!")|
|B -> A||B1||ACK(A2)|
|B -> A||B2||Data ("sup?")|
|A -> B||A3||ACK(B2)|
Messages can be piggybacked: if two messages are going in the same direction, they can be combined into a single message. In the above example, B can piggyback its ACK of A2 onto its Data packet:
|A -> B||A2||Data ("hello!")|
|B -> A||B1||Data ("sup?") / ACK(A2)|
|A -> B||A3||ACK(B2)|
If a segment is not acknowleged within a timeout, the segment is resent. This can happen either because the initial transmission was dropped, or because the acknowlegement was dropped. This may cause the receiver to receive duplicate segments; it will simply discard the duplicates.
When either endpoint is done sending data, it should inform the other end by closing the connection. This will cause a FIN packet to be sent (which should be acknowleged by the remote endpoint). When both endpoints have acknowleged their corresponding FIN packets, the connection is closed.
If a sender has a large amount of data to send over TCP, and if it waits for the acknowlegement of the first segment before transmitting the second, then it will be wasting a lot of bandwidth; especially if transmitting over a high latency connection (Bandwidth is the amount of data that can be transmitted over a given link in a given time period, latency is the total time it takes for a single unit of data to be transmitted)
However, if it tries to transmit too much data, it may overwhelm the links in between. This could lead to packet loss (because overloaded routers may simply discard packets, for example), and a high rate of retransmission.
Ideally, the sender will send just enough data at a time to keep the connection saturated but not oversaturated. TCP uses an adaptive algorithm to determine the right amount of data to send.
The number of sent but unacknowleged packets is called the window size. TCP adapts its window size using linear increase with exponential backoff with slow start: - slow start: initially the windows size is a small value (such as 1). As long as no packets are dropped, the window size increases exponentially. - After any packet is dropped (i.e. an acknowlegement is not received before the timeout expires), the sender will decrease the window size exponentially (exponential backoff). - As soon as one packet has been lost, the slow start period is over; successful deliveries will only increase the window size linearly (linear increase)
This algorithm does a good job of approximating the maximum bandwidth and adapting to change in the network.
Another parameter that can change is the amount of time to wait for a timeout to occur (while waiting for acknowlegements). Ideally the timeout duration would be slightly longer than the round-trip time from the sender to the receiver and back.
Most TCP implementations compute a weighted historical average of the round-trip time. The initial estimate t0 comes from the TCP handshake. Each time an acknowlegement is received, the amount of time between when the packet was sent and when the ack was received gives a more recent estimate of the round trip time. This can be used to form an updated estimate to use for future timeouts.