Stream Sockets
Let us explore the basics of stream sockets using TCP, which are the type of socket used most frequently encountered in systems programming. Setting up and sending and receiving data with other socket types and protocols uses many of the same system calls, but each has its own unique rules which you would need to study through reading the associated documentation. This will serve as a brief overview of networking, which is covered in greater detail in other courses.
Establishing a Connection
TCP sockets are connection oriented, meaning that before data can be transmitted between hosts, a connection must be explicitly established between the two hosts. This is in contrast to connectionless protocols which allow hosts to transmit unsolicited data to each other. The process of establishing a connection begins with one host, the server, creating a socket with socket()
, binding it to a particular port with bind()
, and establishing a listen queue with listen()
. The listen queue holds incoming connection requests until the server accepts them; it is populated by the kernel when other hosts attempt to connect.
The other host, the client, then initiates a new connection by creating its own socket with socket()
and connecting it to the server’s IP address and listen port with connect()
. The server then accepts the request from its listen queue with accept()
, which returns an entirely new, separate socket from the one associated with the listen queue; this new socket is then used by the server to communicate with that specific client.
For both the server and client, the getaddrinfo()
function is used to look up the relevant address info corresponding to a particular host, port, and protocol. It takes a set of hints and returns a list of address info structures that match the given hints. The server and client both iterate over the list of address info objects attempting to bind or connect, respectively, until one succeeds. This accounts for cases where, for example, a server might have multiple network interfaces or be associated with multiple IP addresses, and so on.
This allows a server to accept multiple concurrent client connections and communicate with each individually, using the same server-side port number for all of the connections. This is supported by TCP in that connections are differentiated in TCP packet headers using four pieces of information: source IP, source port, destination IP, and destination port. When a client connects to the server, it uses an arbitrary ephemeral port number to do so. Since the client IP and client’s ephemeral port number are always unique to a particular connection, the server can use its own IP address and the same port number to listen for and communicate with multiple clients.
Communication Between Hosts
After a connection is established, each end of the socket works just like a regular file descriptor for sending and receiving data. Data written into one side of the connection via the write()
system call can be read from the other side with the read()
system call. TCP guarantees in-order delivery of data on a byte-by-byte basis, which is very similar to other familiar file-like concepts such as pipes.
Terminating a Connection
Once data transmission is complete, the connection must be cleanly terminated. This is accomplished through an ordered shutdown process. Each host can disable reading or writing (or both) on its socket using shutdown()
. When a host shuts down the read side of its socket, any additional data that it receives will be discarded and trigger a TCP RST
(connection reset) reply packet which is an abnormal connection termination. In order to prevent this from occurring, a host that is done sending data must first shutdown the write side of its socket. This causes the other host to see end-of-file (i.e. read()
returns 0) after it finishes reading any remaining data; at this point, it is assured that no more data can be sent, so it can safely shut down the read side of its socket. This process then repeats in the opposite direction, at which point both sides can safely close their sockets with close()
, terminating the connection.