OS Interface: Sockets

Next: Sun RPC Up: Communication across a Network Previous: Process to Process Communication:

OS Interface: Sockets

We have seen earlier abstract ideas behind OS-supported message passing, which can be implemented on the network layers mentioned above. To show how an OS layer sits on top of these network layers, let us consider the concrete example of Unix sockets. Unix sockets have several unique characteristics. They are not tied to a particular network protocol, and provide a uniform interface to UDP and TCP/IP and other protocols at the process-to-process level. They are united with the file system, with socket descriptors essentially being a special case of file descriptors. Moreover, like TCP, they have been designed for transfer of bulk data such as files. They support a combination of free, input, and bound ports, allowing dynamic creation of new input and free ports. Unlike the Xinu IPC mechanism you have seen so far, they allow processes in different address spaces and on different hosts to communicate with each other. We see below how these goals are met.

Socket declarations are given in the file sys/socket.h. The following code fragments executed by the server and client illustrate the use of sockets. The server executes code of the form:

----------datagram--------
server_end = socket (af, type, protocol)
bind (server_end, local_addr, local_addr_len)
recvfrom (server_end, msg, length, flags, &from_addr, &from_addr_len)
sendto (server_end, msg, length, flags, dest_addr, addr_len)
connect (client_end, dest_addr, dest_addr_len) -- optional
read, write, send, receivce
---------stream----------
input_sock = socket (af, type, protocol)
bind (input_socket, local_addr, local_addr_len)
listen (input_socket, qlength) 
server_end = accept (input_socket, &remote_addr, &remote_addr_len)

write (server_end, msg, len)
or
send (server_end, msg, length, flags)

read (server_end, &msg, len)
or
recv (server_end, &msg, len, flags)

and the receiver similarly executes

-------datagram-------
client_end = socket (af, type, protocol)
bind (client_end, local_addr, local_addr_len)
sendto (client_end, msg, length, flags, dest_addr, addr_len)
connect (client_end, dest_addr, dest_addr_len) -- optional
read, write, send, receivce
--------stream------
client_end = socket (af, type, protocol)
/* client end is automatically bounbd to some port no chosen by system */
connect (client_end, dest_addr, dest_addr_len)
read, write, send, receive

Datagram communication: Consider first data gram communication. A server providing a service first uses the socket() call to create a socket that serves as an end point for communication. The af parameter indicates address family or name space (AF_INET - (host, port), AF_Unix - Unix file system, AF_APPLETALK), type indicates type of connection (SOCK_DGRAM, SOCK_STREAM) and protocol indicates the kind of protocol to be used (IPPROTO_IP - System Picks, IPPROTO_TCP, ...)

The bind call binds the socket to a source address. A separate external name space is needed for the source address since communicating processes do not share memory. It is the structure of this address that the name space argument specified in the previous call. The external address is a Unix file name if the address family is AF_unix. The system creates an internal file in which it stores information telling it information about receiving processes and their socket descriptors bound to the file. This name space is restrictive as it does not allow processes not sharing a file system to communicate with each other.

In case of the popular AF_INET address family, the receiver IP address and a port number are used as the external address. With port numbers on the receiver machine, information about receving processes and their socket descriptors are kept. The structure of local IP addresses is given in the file netinet/in.h.

As mentioned before, the external address indicates an internet address and port number. The port number is chosen by the server. Port numbers 0..1023 are reserved by system servers such as ftp and telnet. Look at the file /etc/services for port number of system servers. A user-defined server must choose an unbound port number greater than 1023. You can explicitly pick one of these values and hope no other process is using the port. A port number of 0 in the port field indicates that the system should choose the next unbound port number, which can then be read back by the server using getsockname. When communicating internet addresses, port numbers, and other data among machines with different byte orders, you should use routines (such as htons, htonl, and ntohs ) that convert between network and host byte orders for shorts/longs. A bind call can use a special constant (htonl (INADDR_ANY)) for an internet address to indicate that it will listen on any of the internet addresses of the local host.

To talk to a server using datagram a client needs to create a connection endpoint through its own socket and bind call. The server and client can now talk to each other using the sendto and recvfrom calls.

Now consider stream communication. The server uses the socket() call to create a socket that serves as an "input port" for creating new stream-based sockets.

To talk to the server, a client needs to create a connection endpoint through its own socket call. Then it can use connect to link its socket with the server socket in a stream. If it does not have the internet address of the host, it can determine this address from the host name using gethostbyname.

A socket bound by the server serves as an ``input port'' to which multiple clients can connect. The connect call creates a new ``bound port'' for the server-client connection. The client end of the bound port is the socket connected by the client. The other end is returned to the server by the accept call, when a successful connection is established by the client. Typically, a server forks a new copy of itself when the connection is established, with the copy inheriting the new socket descriptor. The server restricts the number of connections to a bound socket by passing an appropriate queue_length to the non-blocking listen call, which mus be made before any accept operation is invoked on the socket. For UDP datagrams, no connections need to be established through accept and connect (as we saw before) - the connect call can be invoked but it is a local operation simply storing the intended remote address with the socket and allowing the use of read() and write(). In the stream case, the client usually does not bind its end of the socket to a local port, the system automatically binds the socket to an anonymous port. Sockets are not strictly input or bound ports since they can be inherited and accessed by children of the process that created them (through the mechanism of file descriptor inheritance we shall see later).

Data can be send/received using either the regular read and write calls, or the special send and recv calls, which take additional message-specific arguments such as send/receive ``out of band'' data (which makes sense for stream-based communication in which normaly data are received in the order they are sent.)

If a process is willing to receive data on more than one I/O descriptor (socket, standard input), then it should use the select call:

int select (width, readfds, writefds, exceptfds, timeout)
     int width;
     fd_set *readfds, *writefds, *exceptfds;
     struct timeval *timeout;

The call blocks till activities occur in the the file descriptors specified in the arguments. The activity could be completion of a read or write or the occurence of an exceptional condition. The file descriptors are specified by setting bits in the fd_set bitmasks. Only 0..width-1 bits are examined. The bitmasks return the descriptors on which the activities occured, and the program can then use them in subsequent read/write calls. You should always do a select before doing a read or write since the I/O calls are not guaranteed to block.

Next: Sun RPC Up: Communication across a Network Previous: Process to Process Communication:

Prasun Dewan 2006-02-02