Chapter 2. Sockets-based Communication

This chapter describes the BSD sockets-based Inter-Process Communication (IPC) facilities available with the IRIX operating system.

Topics in this chapter include:

Sockets Basics

A socket is the basic building block for program-to-program communication. A socket is an endpoint of communication to which a name can be bound. Each socket in use has a type and one or more associated processes. Sockets are typed according to their communication properties.

Four socket types are available:

  • Stream sockets provide a bidirectional, reliable, sequenced, and unduplicated flow of message data.

  • Datagram sockets support bidirectional data flow, but do not guarantee that the message data is sequenced, reliable, or unduplicated.

  • ST sockets provides a reliable, sequenced, and unduplicated flow of message data and requires data transfers to be scheduled in advance.

  • Raw sockets give you access to the underlying communication protocols that support socket abstractions.

(All four socket types are described in “Socket Types”..)

The processes associated with a socket communicate through the socket. Sockets are presumed to communicate with sockets of the same type; however, nothing prevents communication between sockets of different types should the underlying communication protocols support it.

Sockets exist within communication domains. A domain dictates various properties of the socket. One such property is the scheme used to name sockets. For example, in the UNIX communication domain, sockets are named with UNIX pathnames; a socket, for example, may be named /dev/foo.

Normally, sockets exchange data within the same domain. It may be possible to cross domain boundaries, but only if some translation process is performed.

The sockets facility supports three communication domains:

  • The UNIX domain is used only for on-system communication.

  • The Internet domain is used by processes that communicate using the Internet standard communication protocols IP/TCP/UDP.

  • The Raw domain provides access to the link-level protocols of network interfaces (unique to IRIX).

The underlying communication facilities provided by each domain significantly influence the interface to the sockets facilities available to users, providing protocol-specific socket properties that may be set or changed by the user. For example, a socket operating in the UNIX domain can see a subset of the error conditions that are possible when operating in the Internet domain.

In general, there is one protocol for each socket type within each domain. The code that implements a protocol keeps track of the names that are bound to sockets, sets up connections, and transfers data between sockets, perhaps sending the data across a network. It is possible for several protocols, differing only in low-level details, to implement the same style of communication within a particular domain. Although it is possible to select which protocol should be used, for nearly all uses it is sufficient to request the default protocol.

Socket Types

This section describes the three socket types: stream sockets, datagram sockets, and raw sockets.

Stream Sockets

A stream socket provides a bidirectional, reliable, sequenced, and unduplicated flow of data without record boundaries. Aside from the bidirectionality of data flow and some additional signaling facilities, a pair of connected stream sockets provides an interface similar to that of a pipe. (In the UNIX domain, in fact, the semantics are identical.)


Note: Stream sockets should not be confused with STREAMS, the modularized driver interface on which TLI is built.


Datagram Sockets

A datagram socket supports the bidirectional flow of messages that are not necessarily sequenced, reliable, or unduplicated. That is, a process receiving messages on a datagram socket can find messages duplicated or in a different order. The data in any single message, however, is in the correct order, with no duplications, deletions, or changes.

An important characteristic of a datagram socket is that record boundaries in the data are preserved. Datagram sockets closely model facilities found in many packet-switched networks. However, datagram sockets provide additional facilities, including routing and fragmentation.

Routing is used to forward messages from one local network to another nearby or distant network. Dividing one large network into several smaller ones can improve network performance in each smaller network, improve security, and facilitate administration and troubleshooting.

Fragmentation divides large messages into pieces small enough to fit on the local medium. It allows application programs to use a single message size independent of the packet size limitations of the underlying networks.

ST Sockets

An ST socket provides a reliable, sequenced, and unduplicated flow of data. The ST protocol provides the receiving host or device more control over the flow of data by requiring data transfers to be scheduled in advance and does not allow data to be sent until the resources to support the transfer have been allocated and reserved on the receiving host or device.

Raw Sockets

A raw socket provides access to the underlying communication protocols that support socket abstractions. Raw sockets are normally datagram-oriented, though their exact characteristics depend on the interface provided by the protocol.

Raw sockets are not intended for the general user. They are provided for programmers interested in developing new communication protocols or for gaining access to some of the more esoteric facilities of an existing protocol.

Creating Sockets

To create a socket, use the socket() system call (see socket(2)):

#include <sys/types.h>
#include <sys/socket.h>
s = socket(domain, type, protocol);

This call creates a socket in the specified domain, of the specified type, using the specified protocol, and returns a descriptor (a small integer) that can be used in later system calls operating on sockets.

If protocol is not specified (a 0 value is given), a default protocol is used. The system selects from the protocols that make up the communication domain and that can be used to support the requested socket type.

The domain is specified as one of the manifest constants defined in the file <sys/socket.h>:

AF_UNIX 

UNIX domain

AF_INET 

Internet domain

AF_RAW 

Raw domain


Note: AF indicates the address family (or format) to use in interpreting names.


The socket types are also defined in <sys/socket.h>, as SOCK_STREAM, SOCK_DGRAM, or SOCK_RAW.

For example, to create a stream socket in the Internet domain, you could use this call:

s = socket(AF_INET, SOCK_STREAM, 0);

This creates a stream socket in which underlying communication support is provided by the default protocol, TCP.

The default protocol should be correct for most situations. However, you can specify other protocols; see “Selecting Protocols” for details.

To create a datagram socket for same–machine use, the call might be:

s = socket(AF_UNIX, SOCK_DGRAM, 0);

To create an ST socket in the Internet domain, you could use this call:

s = socket(AF_INET, SEQPACKET, 0);

This creates an ST socket in which underlying communication support is provided by IP protocol.

To create a drain socket, which receives all packets that have a network–layer type-code or encapsulation not implemented by the kernel, use this call:

#include <net/raw.h>
s = socket(AF_RAW, SOCK_RAW, RAWPROTO_DRAIN);

For details about raw domain sockets, see the manual pages for raw(7F), snoop(7P), and drain(7P).

A socket() call can fail for several reasons, each of which sets the errno variable appropriately. Aside from the rare occurrence of lack of memory (ENOBUFS), a socket request can fail in response to a request for an unknown protocol (EPROTONOSUPPORT) or a request for a type of socket for which there is no supporting protocol (EPROTOTYPE).

Binding Local Names to a Socket

A socket is created without a name. Until a name is bound to the socket, processes have no way to reference it, and, consequently, no messages can be received on it.

Communicating processes are bound by an association. An association is a temporary or permanent specification of a pair of communicating sockets.

In the Internet domain, an association is composed of local and foreign addresses, and local and foreign ports. The structure of Internet domain addresses is defined in the file <netinet/in.h>.

Internet addresses specify a host address (a 32-bit number) and a delivery slot, or port, on that machine. These ports are managed by the system routines that implement a particular protocol. Unlike UNIX domain socket names, Internet domain socket names are not entered into the filesystem and, therefore, do not have to be unlinked after the socket is closed.

When a message is exchanged between machines, it is first sent to the protocol routine on the destination machine. This routine interprets the address to determine to which socket the message should be delivered. Several different protocols may be active on the same machine, but, in general, they will not communicate with one another. As a result, different protocols are allowed to use the same port numbers. Thus, an Internet address is a triple address, including a protocol, the port, and the machine address.

An Internet association is identified by the tuple <protocol, local address, local port, remote address, remote port>. Duplicate tuples are not allowed. An association may be transient when using datagram sockets; the association actually exists during a send() operation.

In the UNIX domain, an association is composed of local and foreign pathnames (a foreign pathname is a pathname created by a foreign process, not a pathname on a foreign system). UNIX domain sockets need not always be bound to a name, but when they are bound, there may never be duplicate <protocol, local pathname, foreign pathname> tuples.

The pathnames may not refer to files already existing on the system. Like pathnames for normal files, they may be either absolute (for example, /dev/imaginary) or relative (for example, socket). Because these names are used to allow processes to rendezvous, relative pathnames can pose difficulties and should be used with care.

When a name is bound into the name space, a file (inode) is allocated in the filesystem. If the inode is not deallocated, the name will continue to exist even after the bound socket is closed. This situation can cause subsequent runs of a program to find a name unavailable and can cause directories to fill up with these objects. You can remove names by calling unlink() (see unlink(2)) or by using the rm command.

Names in the UNIX domain are used only for rendezvous; they are not used for message delivery once a connection is established. Therefore, in contrast to the Internet domain, unbound sockets are not, and need not be, automatically given addresses when they are connected.

The bind() system call (see bind(2)) allows a process to specify half of an association, <local address, local port> (or <local pathname>), while the connect() and accept() system calls are used to complete a stream socket's association.

The form of the bind() system call is:

bind(s, name, namelen);

The bound name is a variable-length byte string that is interpreted by the supporting protocol(s). The interpretation of the bound name may vary from communication domain to communication domain (this is one of the properties that make up the domain).

In the UNIX domain, names contain a pathname and a family, which is always AF_UNIX. The following code fragment binds the name /tmp/foo to a UNIX domain socket:

#include <sys/un.h>
 ...
struct sockaddr_un addr;
 ...
strcpy(addr.sun_path, "/tmp/foo");
addr.sun_family = AF_UNIX;
bind(s, (struct sockaddr *)&addr, strlen(addr.sun_path) +
     sizeof(addr.sun_family));

Note that in determining the size of a UNIX domain address, null bytes are not counted, which is why strlen() is used.


Note: In the current implementation of UNIX domain IPC under IRIX, the filename referred to in addr.sun_path is created as a socket in the system's file space. The caller must, therefore, have write permission in the directory where addr.sun_path is to reside, and this file should be deleted by the caller when it is no longer needed using the unlink() system call (see unlink(2)). Future versions of IRIX may not create this file.

In the Internet domain, binding names to sockets can be fairly complex. Fortunately, it usually isn't necessary to specifically bind an address and port number to a socket, because the connect() and send() calls automatically bind an appropriate address if they are used with an unbound socket. To bind an Internet address, use the bind() system call like this:

#include <sys/types.h>
#include <netinet/in.h>
 ...
struct sockaddr_in sin;
 ...
bind(s, (struct sockaddr *)&sin, sizeof(sin));


Note: Selecting what to place in the address sin requires some discussion. See “Network Library Routines” for information about formulating Internet addresses and the library routines used in name resolution.


Establishing Socket Connections

Stream socket connections are usually established asymmetrically, with one process a client and the other a server. When it offers its advertised services, the server binds a socket to a well-known address associated with the service and then passively listens on its socket. It is then possible for an unrelated process to rendezvous with the server.


Note: For details about datagram sockets, see “Connectionless Sockets”.

The client requests services from the server by initiating a connection to the server's socket. On the client side, the connect() call is used to initiate a connection. Using the UNIX domain, this might appear as:

struct sockaddr_un server;
 ...
connect(s, (struct sockaddr *)&server,
    strlen(server.sun_path) +
    sizeof(server.sun_family));

Using the Internet domain, this might appear as:

struct sockaddr_in server;
 ...
connect(s, (struct sockaddr *)&server, sizeof(server));

In the preceding examples, server contains either the UNIX pathname or the Internet address and port number of the server to contact. If the client process's socket is unbound at the time of the connect() call, the system will automatically select and bind a name to the socket if necessary. This is the way local addresses are usually bound to a socket.

The connect() call returns an error if the connection attempt was unsuccessful (any name automatically bound by the system, however, remains). Otherwise, the socket is associated with the server, and data transfer can begin.

When a connection attempt fails, an error is returned and the global variable errno is set to indicate the error. Table 2-1 lists some of the more common errno values.

Table 2-1. Common errno values

Value

Explanation

ETIMEDOUT

This error indicates that after failing to establish a connection for a period of time, the system stopped trying. It usually occurs because the destination host is down or because problems in the network resulted in lost transmissions.

ECONNREFUSED

This error indicates that the host has refused service. It usually occurs because a server process is not present at the requested port on the host. It may also indicate an explicit refusal due to access control.

EHOSTDOWN, ENETDOWN

These errors describe status information delivered to the client host by the underlying communication services.

EHOSTUNREACH,

ENETUNREACH

These errors can occur either because the network or host is unknown (no route to the network or host is present) or because of status information returned by intermediate gateways or switching nodes. Many times the status returned is not sufficient to determine if a network or host is down, in which case the system indicates that the entire network is unreachable.

To receive a client's connection, the server must perform two steps after binding its socket: it indicates that it is ready to listen for incoming connection requests, and then it accepts the connection.

To indicate that a socket is ready to listen for incoming connection requests, use the listen() call (see listen(2)):

listen(s, 5);

The second parameter of the listen() call specifies the maximum number of outstanding connections that can be queued awaiting acceptance by the server process; this number is limited by the system and is a value that is intended to catch flagrant abuses of system resources. If a connection is requested while the queue is full, the connection is not refused, but the individual messages that make up the request are ignored. This gives a busy server time to make room in its pending connection queue while the client retries the connection request. If the connection returns with the ECONNREFUSED error, the client will be unable to determine whether the server is up.

It is still possible to get the ETIMEDOUT error back, though this is unlikely. The backlog figure supplied with the listen() call is limited to a very large value, (currently 1000). Applications should limit the backlog parameter to a value consistent with a server's usage.

With a socket marked as listening, a server can accept a connection by using the accept() system call (see accept(2)):

struct sockaddr_in from;
int fromlen = sizeof (from);
newsock = accept(s, (struct sockaddr *)&from, &fromlen);


Note: For the UNIX domain, from would be declared as a struct sockaddr_un, but the rest of this example would remain the same. The examples that follow describe only Internet-domain routines.

A new descriptor is returned on receipt of a connection (along with a new socket). To identify the client, a server can supply a buffer for the client socket's name. The server initializes the value-result parameter fromlen to indicate how much space is associated with from. The parameter is then modified on return to reflect the true size of the name. If the client's name is not of interest, the second parameter can be a null pointer.

The accept() call normally blocks. That is, accept() will not return until a connection is available or the system call is interrupted by a signal to the program. Furthermore, a program cannot indicate it will accept connections from only a specific individual or individuals. It is up to the program to consider whom the connection is from and close down the connection if it does not wish to speak to the remote program. If the server program wants to accept connections on more than one socket, or wants to avoid blocking on the accept call, there are alternatives; see “The Client/Server Model” for details.

Transferring Data

IRIX has several system calls for reading and writing information. The simplest calls are read() and write() (see read(2) and write(2)). They take as arguments a descriptor, a pointer to a buffer containing the data, and the size of the data:

char buf [100];
 ...
write(s, buf, sizeof (buf));
read(s, buf, sizeof (buf));

The descriptor may indicate a file or a connected socket. “Connected” can mean either a connected stream socket or a datagram socket for which a connect() call has provided a default destination. The write() call requires a connected socket, since no destination is specified in the parameters of the system call. The read() call can be used for either a connected or an unconnected socket. These calls are, therefore, quite flexible and may be used to write applications that do not require assumptions about the source of their input or the destination of their output.

The readv() and writev() calls (see read(3) and write(3)) (for read and write vector) are variations of the read() and write() calls, which allow the source and destination of the input and output to use several separate buffers, while retaining the flexibility to handle both files and sockets.

Sometimes it's necessary to send high-priority data over a connection that may have unread low-priority data at the other end. For example, a user interface process may be interpreting commands and sending them on to another process through a stream connection. The user interface may have filled the stream with as-yet-unprocessed requests when the user types a command to cancel all outstanding requests. Rather than have the high-priority data wait to be processed after the low-priority data, it is possible to send it as out-of-band (OOB) data. OOB data is specific to stream sockets and is discussed in “Out-of-Band Data”.

The send() and recv() calls (see send(2) and recv(2)) are similar to read() and write(), but they allow options, including sending and receiving OOB information:

send(s, buf, sizeof (buf), flags);
recv(s, buf, sizeof (buf), flags);

These calls are used only with connected sockets; specifying a descriptor for a file will result in an error.

While send() and recv() are virtually identical to write() and read(), the addition of the flags argument is important. The flags are defined in <sys/socket.h> and can have nonzero values if one or more of the following are required:

MSG_PEEK 

look at data without reading

MSG_OOB 

send/receive out-of-band data

MSG_DONTROUTE 

send data without routing packets

To preview data, specify MSG_PEEK with a recv() call. The recv() call allows a process to read data without removing the data from the stream. That is, the next read() or recv() call applied to the socket will return the data previously previewed.

One use of this facility is to read ahead in a stream to determine the size of the next item to be read. The option to have data sent in outgoing packets without routing is used only by the routing table management process.

To send datagrams, one must be allowed to specify the destination. The call sendto() (see sendto(2)) takes a destination address as an argument and is therefore used for sending datagrams. The call recvfrom() (see recvfrom(2)) is often used to read datagrams, since this call returns the address of the sender, if it is available, along with the data. If the identity of the sender does not matter, one may use read() or recv().

Finally, there is a pair of calls that allow you to send and receive messages from multiple buffers (the sender must specify the address of the recipient). These are sendmsg() and recvmsg() (see sendmsg(2) and recvmsg(2)). These calls are actually quite general and have other uses, including, in the UNIX domain, the transmission of a file descriptor from one process to another.

Discarding Sockets

A socket is discarded by closing the descriptor; use the close() system call (see close(2)):

close(s);

If data is associated with a socket that promises reliable delivery (for example, a stream socket) when a close takes place, the system will continue trying to transfer the data. However, after a period of time, undelivered data is discarded. Should you have no use for any pending data, perform a shutdown() on the socket prior to closing it:

shutdown(s, how);

The value how is 0 if you do not want to read data, 1 if no more data will be sent, or 2 if no data is to be sent or received.

Scheduled Transfers Sockets

Nearly all SGI socket system calls such as accept(), bind(), connect(), and so on, support ST sockets. ST sockets must be connected before they can send or receive data. Therefore, the sendto() and recfrom() calls are not supported for ST sockets.

Connectionless Sockets

The sockets described so far follow a connection-oriented model. However, connectionless interactions, typical of the datagram facilities found in contemporary packet-switched networks, are also supported. A datagram socket provides a symmetric interface to data exchange. While processes are still likely to be client and server, there is no requirement for connection establishment. Instead, each message includes the destination address.

Datagram sockets are created as described in “Creating Sockets”.. If a particular local address is needed, the bind() operation must precede the first data transmission. Otherwise, the system will set the local address and/or port when data is first sent.

To send data, use the sendto() system call:

sendto(s, buf, buflen, flags, (struct sockaddr *)&to,
       sizeof(to));

The s, buf, buflen, and flags parameters are used as described for the send() call (see “Transferring Data”). The to value indicates the destination address. On an unreliable datagram interface, errors probably will not be reported to the sender. When information is present locally to recognize a message that cannot be delivered (for instance when a network is unreachable), the call will return –1 and the global variable errno will contain an error number.

To receive messages on an unconnected datagram socket, use the recvfrom() call:

recvfrom(s, buf, buflen, flags, (struct sockaddr *)&from,
         &fromlen);

Once again, the value-result parameter, fromlen, initially contains the size of the from buffer and is modified on return to indicate the actual size of the address from which the datagram was received. If you don't care who the sender is, use 0 for the &from and &fromlen parameters.

In addition to sendto() and recvfrom(), datagram sockets can use the connect() call to associate a socket with a specific destination address. In this case, any data sent on the socket will automatically be addressed to the connected peer, and only data received from that peer will be delivered to the user.

Only one connected address is permitted for each socket at a time; a second connect() will change the destination address, and a connect() to a “null” address (family AF_UNSPEC) will cause a disconnection.

Connection requests on datagram sockets return immediately, because the request simply results in the system recording the peer's address. Connection requests on a stream socket, however, do not return immediately; the request initiates the establishment of an end–to–end connection. (The accept() and listen() calls are not used with datagram sockets.)

While a datagram socket is connected, errors from recent send() calls can be returned asynchronously. These errors can be reported on subsequent operations on the socket or by using a special socket option, SO_ERROR, with getsockopt() that can be used to interrogate the error status. A select() for reading or writing will return true when an error indication has been received. The next operation will return the error, and the error status is cleared. For additional details about datagram sockets, see “Advanced Topics”..

I/O Multiplexing

You can multiplex I/O requests among multiple sockets and/or files by using the
select() call:

#include <sys/time.h>
#include <sys/types.h>
 ...
fd_set readmask, writemask, exceptmask;
struct timeval timeout;
 ...
select(nfds, &readmask, &writemask, &exceptmask, &timeout);

The select() call takes three sets of pointers as arguments:

  • one for the set of file descriptors on which the caller wants to read data

  • one for the set of file descriptors on which data is to be written

  • one for which exceptional conditions are pending (out-of-band data is the only exceptional condition currently implemented)

If you are not interested in certain conditions (that is, read, write, or exceptions), the corresponding argument to the select() call should be a null pointer.

Each set is a structure containing an array of long integer bit masks. The size of the array is set by the definition FD_SETSIZE. The array must be long enough to hold one bit for each FD_SETSIZE file descriptor.

The set should be zeroed before use. To clear the set mask, use this macro:

FD_ZERO(&mask)

To add and remove the file descriptor fd in the set mask, use these macros:

FD_SET(fd, &mask)
FD_CLR(fd, &mask)

The parameter nfds in the select() call specifies the range of file descriptors (one plus the value of the largest descriptor) to be examined in a set.

You can specify a timeout value if the selection will not last more than a predetermined period of time. If the fields in timeout are set to 0, the selection takes the form of a poll, returning immediately. If timeout is a null pointer, the selection will block indefinitely. To be more specific, a return takes place only when a descriptor is selectable or when a signal is received by the caller, interrupting the system call.

The select() call normally returns the number of file descriptors selected. If the select() call returns because the timeout expires, the value 0 is returned. If the select() call terminates because of an error or interruption, –1 is returned with the error number in errno, and with the file descriptor masks unchanged.

For a successful return, the three sets will indicate which file descriptors are ready to be read from, written to, or have exceptional conditions pending. The status of a file descriptor in a select mask can be tested with this macro:

FD_ISSET(fd, &mask)

This macro returns a nonzero value if fd is a member of the set mask, and 0 if it is not.

To check for read readiness on a socket to be used with an accept() call, use select() followed by the FD_ISSET(fd, &mask) macro. If FD_ISSET returns a nonzero value, indicating permission to read, then a connection is pending on the socket.

For example, to read data from two sockets, s1 and s2, as the data becomes available and with a one-second timeout, this code might be used:

#include <sys/time.h>
#include <sys/types.h>
 ...
fd_set read_template; struct timeval wait;
 ...
for (;;) {
    wait.tv_sec = 1;        /* one second */
    wait.tv_usec = 0;
    FD_ZERO(&read_template);
    FD_SET(s1, &read_template);
    FD_SET(s2, &read_template);
    nb = select(FD_SETSIZE, &read_template, (fd_set *) 0,
                (fd_set *) 0, &wait);
    if (nb <= 0) {
        /*
         *  An error occurred during the select, or
         *  the select timed out.
         */
    }
    if (FD_ISSET(s1, &read_template)) {
        /* Socket #1 is ready to be read from. */
    }
    if (FD_ISSET(s2, &read_template)) {
        /* Socket #2 is ready to be read from. */
    }
}


Note: In 4.2BSD, the arguments to select() were pointers to integers instead of pointers to fd_sets. This type of call will still work, as long as the largest file descriptor is numerically less than the number of bits in an integer (that is, 32). However, the methods illustrated above should be used in all current programs.

The select() call provides a synchronous multiplexing scheme. Asynchronous notification of output completion, input availability, and exceptional conditions are possible through the use of the SIGIO and SIGURG signals.

Network Library Routines

When you use the IPC facilities in a distributed environment, programs need to locate and construct network addresses. This section discusses the library routines you can use to manipulate Internet network addresses.

Locating a service on a remote host requires many levels of mapping before client and server can communicate. A service is assigned a name, such as login server, that humans can easily understand. This name, and the name of the peer host, must then be translated into network addresses. Finally, the address is used to locate a physical location and route to the service.

The specifics of these mappings can vary among network architectures. For instance, it is desirable that a network not require a host to have a name indicating its physical location to a client host. Instead, underlying services in the network can discover the actual location of the host at the time the client host wants to communicate.

This ability to have hosts named in a location–independent manner can induce overhead in connection establishment, as a discovery process must take place, but it allows a host to be physically mobile. The host does not have to notify its clients of its current location.

Standard routines are provided for these mappings:

  • host names to network addresses

  • network names to network numbers

  • protocol names to protocol numbers

  • service names to port numbers

Routines also indicate the appropriate protocol to use to communicate with the server process. The file <netdb.h> must be included when using any of these routines.

Host Names

The hostent data structure provides Internet host Name-to-Address Mapping:

struct    hostent {
    char  *h_name;       /* official name of host */
    char  **h_aliases;   /* alias list */
    int   h_addrtype;    /* host address type (eg AF_INET) */
    int   h_length;      /* length of address */
    char  **h_addr_list; /* list of addresses, */
                         /* null-terminated */
};
/* first address, in network byte order, for backward
 * compatibility: */
#define h_addr    h_addr_list[0]

The routine gethostbyname() takes an Internet host name and returns a hostent structure, while the routine gethostbyaddr() maps Internet host addresses into a hostent structure (see gethostbyname(3N) and gethostbyaddr(3N)).

These routines return the official name of the host and its public aliases, along with the address type (family) and a null-terminated list of variable-length addresses. The list of addresses is required because a host may have many addresses and the same name. The h_addr definition is provided for backward compatibility and is defined as the first address in the list of addresses in the hostent structure.

The database for these calls is provided either by the /etc/hosts file—see hosts(4)—or by using the Internet domain name server, named (see named(1M)). The database can also come from the NIS, if you have the NFS option. Because of the differences in these databases and their access protocols, the information returned can differ. When using the host table or NIS versions of gethostbyname(), the call returns only one address but includes all listed aliases. When using the name server version, the calls can return alternate addresses, but they will not provide any aliases other than the one given as the argument.

Network Names

The netent data structure defines the Network-Name-to-Network-Number Mapping used with the getnetbyname(), getnetbynumber(), and getnetent() routines (see getnetbyname(3N), getnetbynumber(3N), and getnetent(3N)):

/*
 * Assumption here is that a network number
 * fits in 32 bits.
 */
struct netent {
    char  *n_name;     /* official name of net */
    char  **n_aliases; /* alias list */
    int   n_addrtype;  /* net address type */
    int   n_net;       /* network number, host byte order */
};

These routines are the network counterparts to the host routines described in the preceding section. The routines extract their information from /etc/networks or from the NIS if the NFS option is installed.

Protocol Names

The protoent data structure defines the protocol Name-to-Number Mapping used with the routines getprotobyname(), getprotobynumber(), and getprotoent() (see getprotobyname(3N), getprotobynumber(3N), and getprotoent(3N)):

struct   protoent {
    char  *p_name;      /* official protocol name */
    char  **p_aliases;  /* alias list */
    int   p_proto;      /* protocol number */
};

The routines extract their information from /etc/protocols or from the NIS if the NFS option is installed.

Service Names

A service is expected to reside at a specific port and employ a particular communication protocol. This view is consistent with the Internet domain but is inconsistent with other network architectures. Furthermore, a service can reside on multiple ports. If it does, the higher-level library routines will have to be bypassed or extended. Services available are obtained from the file /etc/services or from the NIS if the NFS option is installed.

The servent structure defines the service Name-to-Port-Number Mapping:

struct   servent {
    char  *s_name;     /* official service name */
    char  **s_aliases; /* alias list */
    int   s_port;      /* port #, network byte order */
    char  *s_proto;    /* protocol to use */
};

The routine getservbyname() (see getservbyname(3N)) maps service names to a servent structure by specifying a service name and, optionally, a qualifying protocol.

The following returns the service specification for a TELNET server using any protocol:

sp = getservbyname("telnet", (char *) 0);

This returns only the TELNET server that uses the TCP protocol:

sp = getservbyname("telnet", "tcp");

The routines getservbyport() and getservent() also provide service mappings (see getservbyport(3N) and getservent(3N)). The getservbyport() routine has an interface similar to that provided by getservbyname—you specify an optional protocol name to qualify lookups.

Network Dependencies

With the support routines already described, an Internet application program rarely has to deal directly with addresses. This allows services to operate as much as possible in a network-independent fashion. However, purging all network dependencies is difficult. As long as the user must supply network addresses when naming services and sockets, some network dependency is required in a program. For example, the normal code included in client programs, such as the remote login program, takes the form shown in Example 2-1:

Example 2-1. A Remote-Login Client

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdio.h>
#include <netdb.h>

main(argc, argv)
    int argc;
    char *argv[];
{
    struct sockaddr_in server;
    struct servent *sp;
    struct hostent *hp;
    int s;
    ...
    sp = getservbyname("login", "tcp");
    if (sp == NULL) {
        fprintf(stderr,
                "rlogin: tcp/login: unknown service\n");
        exit(1);
    }
    hp = gethostbyname(argv[1]);
    if (hp == NULL) {
        fprintf(stderr,
                "rlogin: %s: unknown host\n", argv[1]);
        exit(2);
    }
    bzero((char *)&server, sizeof (server));
    bcopy(hp->h_addr, (char *)&server.sin_addr,
          hp->h_length);
    server.sin_family = hp->h_addrtype;
    server.sin_port = sp->s_port;
    s = socket(hp->h_addrtype, SOCK_STREAM, 0);
    if (s < 0) {
        perror("rlogin: socket");
        exit(3);
    }
    ...
    /* Connect does the bind() for us */
    if (connect(s, (struct sockaddr *)&server,
        sizeof (server)) < 0) {
        perror("rlogin: connect");
        exit(4);
    }
}


Note: To make the remote login program independent of the Internet protocols and addressing scheme, the program would have to have a layer of routines that masked the network-dependent aspects from the mainstream login code. For the current facilities available in the system, this does not appear worthwhile.



Byte Ordering

In addition to the address-related database routines, several other routines are available to simplify manipulation of names and addresses. Table 2-2 summarizes the routines that manipulate variable-length byte strings and handle byte swapping of network addresses and values.

Table 2-2. C Run-time Routines

Call

Synopsis

bcmp(s1, s2, n)

Compare byte strings; 0 if same, not 0 otherwise.

bcopy(s1, s2, n)

Copy n bytes from s1 to s2.

bzero(base, n)

Zero-fill n bytes starting at base.

htonl(val)

(host-to-network-long) Convert 32-bit quantity from host to network byte order.

htons(val)

(host-to-network-short) Convert 16-bit quantity from host to network byte order.

ntohl(val)

(network-to-host-long) Convert 32-bit quantity from network to host byte order.

ntohs(val)

(network-to-host-short) Convert 16-bit quantity from network to host byte order.

The format of the socket address is specified, in part, by standards within the Internet domain. The specification includes the order of the bytes in the address (called the network byte order). Addresses supplied to system calls must be in network byte order; values returned by the system also have this ordering. Because machines differ in the internal representation of integers, examining an address as returned by getsockname() or getservbyname() (see getsockname(2) or getservbyname(3N)) may result in a misinterpretation. To use the number, it is necessary to call the routine ntohs() to convert the number from the network representation to the host's representation. For example:

printf("port number %d\n", ntohs(sp->s_port));

On machines that have “big-endian” byte ordering, such as the IRIS, the ntohs is a null operation. On machines with “little-endian” ordering, such as the VAX™, this results in a swapping of bytes. Another routine exists to convert a short integer from the host format to the network format, called htons(); the ntohl() and htonl() routines exist for long integers. Any protocol that transfers integer data between machines with different byte orders should use these routines. The library routines that return network addresses and ports provide them in network order so that they can simply be copied into the structures provided to the system.

Translation Functions

This section describes the functionality of some of the network library translation routines available with the IRIX operating system. Topics in this section include the following:

  • Node name-to-address and address-to-name translation functions

  • Interface index-to-name and interface name-to-index translation functions

The following translation interfaces are included in section 3N of the man pages:

  • freeaddrinfo

  • freehostent

  • gai_strerror

  • getaddrinfo

  • getipnodebyname

  • getipnodebyaddr

  • getnameinfo

  • if_freenameindex

  • if_indextoname

  • if_nameindex

  • if_nametoindex

  • inet_ntop

  • inet_pton

Node Names and Service Names

The interfaces getaddrinfo, getnameinfo, and freeaddrinfo provide a simplified way to translate between the names and addresses of a service on a node.

You can use the getaddrinfo function instead of calling gethostbyname and getservbyname. The getaddrinfo() function is protocol independent and thread safe and can be used with both IPv6 and IPv4 addresses.

The getaddrinfo() function uses the addrinfo structure, which is defined in netdb.h, as follows

struct addrinfo {
   int             ai_flags;       /* AI_PASSIVE, AI_CANONNAME, etc */
   int             ai_family;      /* AF_xxx */
   int             ai_socktype;    /* SOCK_xxx */
   int             ai_protocol;    /* 0 or IPPROTO_xxx for IPv4,IPv6 */ 
   socklen_t       ai_addrlen;     /* length of ai_addr */
   char            *ai_canonname;  /* canonical name for node name */
   struct sockaddr *ai_addr;       /* binary address */
   struct addrinfo *ai_next;       /* next structure in linked list */
};

The following example shows the use of getaddrinfo(), freeaddrinfo(), and gai_strerror():

#include <sys/socket.h>
#include <netdb.h>

struct addrinfo hints;
struct addrinfo *res, *ressave;
int err_num;
char *nodename, *servname;

nodename = argv[1];
servname = argv[2];
bzero(&hints, sizeof(hints));
hints.ai_flags = AI_ADDRCONFIG | AI_CANONNAME;
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;

err_num = getaddrinfo(nodename, servname, &hints, &res);
if(err_num) {
   fprintf(stderr, "getaddrinfo: %s for nodename %s servname %s\n",
      gai_strerror(err_num), nodename, servname);
   exit(1);
}
ressave = res; /* save res pointer to be freed later */
...         /* process the information returned */
freeaddrinfo(ressave);

The gai_strerror function prints error messages based on the EAI_xxx codes returned by the getaddrinfo() and getnameinfo() functions. The information that getaddrinfo() returns is dynamically allocated. To prevent memory leaks, the freeaddrinfo() function is used to free the addrinfo structures along with any additional storage associated with those structures.

The getnameinfo function translates the contents of a socket address structure to a node name and/or service name. The getnameinfo() function is thread safe.

The following example shows the use of getnameinfo:

#include <sys/socket.h>
#include <netinet/in.h>
#include <netdh.h>
#include <stdio.h>

...
struct sockaddr_storage ss;
socklen_t               sslen;
int                     oldsock, newsock;
int                     err_num;
char                    nodename[NI_MAXHOST];
char                    servname[NI_MAXSERV];

... /* socket(), bind() and listen() calls */
sslen = sizeof(ss);
newsock = accept(oldsock, (struct sockaddr *)&ss, &sslen);
if(newsock == -1) {
...
}
err_num = getnameinfo((struct sockaddr *)&ss, sslen, nodename,
             sizeof(nodename), servname, sizeof(servname), 0);
if(err_num) {
       fprintf(stderr, "getnameinfo: %s\n", gai_strerror(err_num));
       exit(1);
}
else {
       printf("Connection from %s for service %s\n",
             nodename, servname);
}
...

Node Name Mapping

You can use functions such as getipnodebyname() and getipnodebyaddr() to map a node name to a hostent structure and a node address to a hostent structure, respectively. These functions are very similar to gethostbyname and gethostbyaddr, but they require additional arguments for specifying address family and operation modes, and they are not protocol-independent. These routines return a hostent structure containing the name of the host, its aliases, the address type, and a list of addresses that are dynamically allocated by them. The calling program has to use the freehostent() function for freeing the memory region used by the hostent structure allocated by these functions.

Functions such as inet_ntoa() and inet_addr() that convert IPv4 addresses between binary and printable form are specific to 32-bit IPv4 addresses only. The new functions inet_ntop() and inet_pton() can be used to convert both 32-bit IPv4 addresses and 128-bit IPv6 addresses.

The inet_ntop() function maps an IPv4 or IPv6 network format address to a printable format address and inet_pton() maps an IPv4 or IPv6 printable format address to a network format address.

The following example shows the use of inet_ntop:

#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdio.h>

struct addrinfo hints, *res, *ressave;
int s, err_num;
char *nodename, *servname;
char hname[INET_ADDRSTRLEN];

bzero(&hints, sizeof(struct addrinfo)); 
hints.ai_family = AF_INET;
hints.ai_socktype = SOCK_STREAM;

err_num = getaddrinfo(nodename, servname, &hints, &res); 
if(err_num) {
   ... /* exit */

}
ressave = res;
do {
   s = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
   if (s < 0)
      continue;
   if(connect(s, res->ai_addr, res->ai_addrlen) >= 0) {
      inet_ntop(AF_INET,
             &(((struct sockaddr_in *)res->ai_addr)->sin_addr),
                hname, sizeof(hname));
      printf("Connected to %s\n", hname);
      break;
   }
   else {
      ...
   }
} while ((res = res->ai_next) != NULL);
...

Interface Identification

Functions if_indextoname() and if_nametoindex() map between an interface name and index. The if_nameindex() function returns all of the interface names and indexes. The if_freenameindex() function returns the dynamic memory allocated by if_nameindex(). These functions can be used, for example, to specify the interface from which to send multicast packets. (See the example in the description of the IPV6_MULTICAST_IF socket option in “Multicast Socket Options” in Chapter 3.) Interface indexes are small positive integers.

These functions use the if_nameindex structure, defined in net/if.h, which can hold the information about a single interface (see the following example).

struct if_nameindex {
       unsigned int if_index;  /* 1, 2, ... */
       char     *if_name;      /* null terminated name: ef0,...*/
};

The Client/Server Model

The most commonly used paradigm in constructing distributed applications is the client/server model. In this scheme, client applications request services from a server process. This implies an asymmetry in establishing communication between the client and server. (See “Establishing Socket Connections”, for details.) This section examines the interactions between client and server, and considers some of the problems in developing client and server applications.

The client and server require a well-known set of conventions before service can be rendered (and accepted). This set of conventions constitutes a protocol that must be implemented at both ends of a connection. The protocol can be symmetric or asymmetric. In a symmetric protocol, either side can play the master or slave role. In an asymmetric protocol, one side is always the master, and the other is the slave. An example of a symmetric protocol is TELNET, which is used in Internet for remote terminal emulation. An example of an asymmetric protocol is the Internet File Transfer Protocol (FTP). Regardless of whether the protocol is symmetric or asymmetric, when it accesses a service there is a “server process” and a “client process.”

A server process normally listens at a well-known address for service requests. That is, the server process remains dormant until a connection is requested by a client's connection to the server's address. At such a time the server process “wakes up” and services the client, performing actions the client requests.

Alternative schemes that use a server to provide a service can eliminate a flock of server processes clogging the system while remaining dormant most of the time. For Internet servers in BSD-based systems, this scheme has been implemented via inetd, the so-called “Internet super-server.” The inetd daemon listens at a variety of ports, determined at startup by reading a configuration file. When a connection is requested to a port on which inetd is listening, inetd executes the appropriate server program to handle the client. With this method, clients are unaware that an intermediary such as inetd has played any part in the connection. The inetd daemon is described in more detail in “Advanced Topics”..

Connection-based Servers

Most servers are accessed at well-known Internet addresses. The remote login server's main loop takes the form shown in this sample code:

main(int argc, char **argv)
{
    int f;
    struct sockaddr_in from;
    struct servent *sp;
    
    sp = getservbyname("login", "tcp");
    if (sp == NULL) {
        fprintf(stderr,
            "rlogind: tcp/login: unknown service\n");
        exit(1);
    }
    ...
#ifndef DEBUG
    /* Disassociate server from controlling terminal */
    ...
#endif
    /* Restricted port -- see "Address Binding" */
    from.sin_port = sp->s_port;
    ...
    f = socket(AF_INET, SOCK_STREAM, 0);
    ...
    if (bind(f, (struct sockaddr *) &from,
             sizeof (from)) < 0) {
        syslog(LOG_ERR, "rlogind: bind: %m");
        exit(1);
    }
    ...
    listen(f, 5);
    for (;;) {
        int g, len = sizeof (from);
        g = accept(f, (struct sockaddr *)&from, &len);
        if (g < 0) {
            if (errno != EINTR) {
                syslog(LOG_ERR, "rlogind: accept: %m");
            }
            continue;
        }
        if (fork() == 0) {            /* child */
            close(f);
            doit(g, &from);
        }
        close(g);                /* parent */
    }
}

The first step taken by the server is to look up its service definition:

sp = getservbyname("login", "tcp");
if (sp == NULL) {
    fprintf(stderr, "rlogind: tcp/login: unknown service\n");
    exit(1);
}

The result of the getservbyname() call is used in later portions of the code to define the well-known Internet port where the server listens for service requests (indicated by a connection).

The second step taken by the server is to disassociate from the controlling terminal of its invoker:

_daemonize(0, -1, -1, -1);

The _daemonize() function does the common work needed to put a program into the background or to make a program into a daemon. This generally includes fork()ing a new process, closing most files, and releasing the controlling terminal. See the daemonize(3) manual page for details.

The server is protected from receiving signals delivered to the process group of the controlling terminal. Note, however, that once a server has disassociated itself, it can no longer send reports of errors to a terminal and must log errors via syslog().

Once a server has established a pristine environment, it creates a socket and begins accepting service requests. The bind() call is required to ensure that the server listens at its expected location. Note that the remote login server listens at a restricted port number and must therefore be run with a user ID of root. This concept of a “restricted port number” is specific to BSD-based systems; see “Address Binding”, for more information.

The main body of the loop is shown in this example:

for (;;) {
    int g, len = sizeof (from);
    g = accept(f, (struct sockaddr *)&from, &len);
    if (g < 0) {
        if (errno != EINTR) {
            syslog(LOG_ERR, "rlogind: accept: %m");
        }
        continue;
    }
    if (fork() == 0) {    /* Child */
        close(f);
        doit(g, &from);
    }
    close(g);             /* Parent */
}

An accept() call blocks the server until a client requests service. This call could return a failure status if interrupted by a signal such as SIGCHLD. Therefore, the return value from accept() is checked to ensure that a connection has actually been established, and an error report is logged via syslog() if an error has occurred.

With a connection established, the server then fork()s a child process and invokes the main body of the remote login protocol processing. Note that the socket used by the parent for queuing connection requests is closed in the child, while the socket created as a result of accept() is closed in the parent. The address of the client is also handed to the doit() routine, because the routine requires it in authenticating clients.

Connection-based Clients

The client side of the remote login service was described in “Network Dependencies”. The separate, asymmetric roles of the client and server show clearly in the code. The server is a passive entity, listening for client connections, while the client process is an active entity, initiating a connection when invoked.

Consider the steps taken by the client remote login process. As in the server process, the first step is to locate the service definition for a remote login:

sp = getservbyname("login", "tcp");
if (sp == NULL) {
    fprintf(stderr, "rlogin: tcp/login: unknown service\n");
    exit(1);
}

Then the gethostbyname() call looks up the destination host:

hp = gethostbyname(argv[1]);
if (hp == NULL) {
    fprintf(stderr, "rlogin: %s: unknown host\n", argv[1]);
    exit(2);
}

Next, a connection is established to the server at the requested host and the remote login protocol is started. The address buffer is cleared and is then filled in with the Internet address of the remote host and the port number of the login process on the remote host:

bzero((char *)&server, sizeof (server));
bcopy(hp->h_addr, (char *) &server.sin_addr, hp->h_length);
server.sin_family = hp->h_addrtype;
server.sin_port = sp->s_port;

A socket is created and a connection is initiated:

s = socket(hp->h_addrtype, SOCK_STREAM, 0);
if (s < 0) {
    perror("rlogin: socket");
    exit(3);
}
    ...
if (connect(s, (struct sockaddr *)&server,
            sizeof (server)) < 0) {
    perror("rlogin: connect");
    exit(4);
}

Note that connect() implicitly performs a bind() call in this case, since s is unbound.

Connectionless Servers

While connection-based services are the norm, some services are based on the use of datagram sockets. The rwho service is an example. It provides users with status information for hosts connected to a local area network. This service is predicated on the ability to broadcast information to all hosts connected to a particular network.

A user on any machine running the rwho server can find out the current status of a machine with the ruptime program (see ruptime(1C)). For example, ruptime might generate this output:

dali      up    2+06:28,    9 users, load 1.04, 1.20, 1.65
breton    down  0:24
manray    up    3+06:18,    0 users, load 0.03, 0.03, 0.05
magritte  up    1+00:43,    2 users, load 0.22, 0.09, 0.07

Status information for each host is periodically broadcast by rwho server processes on each machine. The same server process also receives the status information and uses it to update a database. This database is then interpreted to generate the status information for each host. Servers operate autonomously, coupled only by the local network and its broadcast capabilities.

The use of broadcast for such a task is inefficient, as all hosts must process each message, whether or not they are using an rwho server. Unless such a service is sufficiently universal and frequently used, the expense of periodic broadcasts outweighs the simplicity. However, on a very small network (for example, one dedicated to a computation engine and several display engines) broadcast works well because all services are universal.


Note: Multicasting reduces the load on host machines and is an alternative to broadcasting. Setting up multicast sockets is described in “IP Multicasting”..

The rwho server, in a simplified form, is shown in this code sample:

main()
{
    ...
    sp = getservbyname("who", "udp");
    from.sin_addr.s_addr = htonl(INADDR_ANY);
    from.sin_port = sp->s_port;
    ...
    s = socket(AF_INET, SOCK_DGRAM, 0);
    ...
    on = 1;
    if (setsockopt(s, SOL_SOCKET, SO_BROADCAST,
                   &on, sizeof(on)) < 0) {
        syslog(LOG_ERR, "setsockopt SO_BROADCAST: %m");
        exit(1);
    }
    bind(s, (struct sockaddr *)&from, sizeof (from));
    ...
    signal(SIGALRM, onalrm);
    onalrm();
    for (;;) {
        struct whod wd;
        int cc, whod, len = sizeof (from);
        
        cc = recvfrom(s, (char *)&wd, sizeof (struct whod),
                      0, (struct sockaddr *)&from, &len);
        if (cc <= 0) {
            if (cc < 0 && errno != EINTR) {
                syslog(LOG_ERR, "rwhod: recv: %m");
            }
            continue;
        }
        if (from.sin_port != sp->s_port) {
            syslog(LOG_ERR, "rwhod: %d: bad from port",
                   ntohs(from.sin_port));
            continue;
        }
        ...
        if (!verify(wd.wd_hostname)) {
            syslog(LOG_ERR,
                   "rwhod: malformed host name from %x",
                   ntohl(from.sin_addr.s_addr));
            continue;
        }
        (void) sprintf(path, "%s/whod.%s", RWHODIR,
                       wd.wd_hostname);
        whod = open(path, O_WRONLY|O_CREAT|O_TRUNC, 0666);
        ...
        /*undo header byte swapping before writing to file*/
        wd.wd_sendtime = ntohl(wd.wd_sendtime);
        ...
        (void) time(&wd.wd_recvtime);
        (void) write(whod, (char *)&wd, cc);
        (void) close(whod);
    }
}

The server performs two separate tasks. The first task is to receive status information broadcast by other hosts on the network. This job is carried out in the main loop of the program. Packets received at the rwho port are interrogated to make sure they were sent by another rwho server process. They are then time-stamped with their arrival time and used to update a file indicating the status of the host. When a host has not been heard from for an extended period of time, the database interpretation routines assume the host is down and indicate such on the status reports. This algorithm is prone to error, because an rwho server can be down while a host is actually up.

The second task performed by the server is to supply host status information. This task involves periodically acquiring system status information, packaging it in a message, and broadcasting it on the local network for other rwho servers to hear. The supply function is triggered by a timer and runs off a signal.

Deciding where to transmit the resultant packet is somewhat problematical. Status information must be broadcast on the local network. For networks that do not support broadcast, another scheme must be used. One possibility is to enumerate the known neighbors (based on the status messages received from other rwho servers). This method requires some bootstrapping information, because a server will have no idea what machines are its neighbors until it receives status messages from them. Therefore, if all machines on a network are freshly booted, no machine will have any known neighbors and thus will never receive, or send, any status information. This problem also occurs in the routing table management process in propagating routing status information.

The standard solution is to inform one or more servers of known neighbors and request that the servers always communicate with these neighbors. If each server has at least one neighbor supplied to it, status information can then propagate through a neighbor to hosts that are not directly neighbors.

If the server is able to support networks that provide a broadcast capability, as well as those that do not, networks with an arbitrary topology can share status information. However, network loops can cause problems. That is, if a host is connected to multiple networks, it will receive status information from itself. This situation can lead to an endless, wasteful exchange of information.

Software operating in a distributed environment should not have any site–dependent information compiled into it. To achieve this, each host must have a separate copy of the server, making server maintenance difficult. The BSD model attempts to isolate host-specific information from applications by providing system calls that return the necessary information. An example of such a call is gethostname() (see gethostname(2)), which returns the host's “official” name. In addition, an ioctl call can find the collection of networks to which a host is directly connected. Furthermore, a local network broadcasting mechanism has been implemented at the sockets level.

Combining these features lets a process broadcast on any directly connected local network that supports the notion of broadcasting in a site-independent manner. The system decides how to propagate status information in the case of rwho, or more generally in broadcasting. Such status information is broadcast to connected networks at the sockets level, where the connected networks have been obtained via the appropriate ioctl calls. The specifics of this kind of broadcasting are discussed in the next section, “Advanced Topics.”

Advanced Topics

For most users of the sockets interface, the mechanisms already described will suffice in constructing distributed applications. However, you might need to use some of the more advanced features described in this section.

Out-of-Band Data

Stream sockets can accommodate “out-of-band” data. Out-of-band data is transmitted on a logically independent transmission channel associated with each pair of connected stream sockets. Out-of-band data is delivered to the user independently of normal data. For stream sockets, the out-of-band data facilities must support the reliable delivery of at least one out-of-band message at a time. This message can contain at least one byte of data, and at least one message can be pending delivery to the user at any one time.

For communication protocols that support only in-band signaling (that is, the urgent data is delivered in sequence with the normal data), the system extracts the data from the normal data stream and stores it separately. You can choose between receiving urgent data in sequence and receiving it out of sequence, without having to buffer all the intervening data.

It is possible to “peek” (via MSG_PEEK) at out-of-band data. If the socket has a process group, SIGURG is generated when the protocol is notified of its existence. A process can set the process group or process ID to be informed by SIGURG via the appropriate fcntl call as described for SIGIO (see “Interrupt-driven Sockets I/O”). If multiple sockets can have out-of-band data awaiting delivery, a select call for exceptional conditions can be used to determine which sockets have such data pending. Neither the signal nor the select indicates the actual arrival of the out-of-band data, only notification of pending data.

In addition to the information passed, a logical mark is placed in the data stream to indicate the point at which the out-of-band data was sent. The remote login and remote shell applications use this facility to propagate signals between client and server processes. When a signal flushes pending output from the remote process(es), all data up to the mark in the data stream is discarded.

To send an out-of-band message, the MSG_OOB flag is supplied to a send() or sendto() call. To receive out-of-band data, MSG_OOB should be indicated when performing a recvfrom() or recv() call. To find out if the read pointer is currently pointing at the mark in the data stream, use the SIOCATMARK ioctl:

int yes;
ioctl(s, SIOCATMARK, &yes);

If the value yes is a 1 on return, the next read will return data after the mark. Otherwise (assuming out-of-band data has arrived), the next read will provide data sent by the client prior to transmission of the out-of-band signal. Example 2-2 shows the routine used in the remote login process to flush output on receipt of an interrupt or quit signal. It reads the normal data up to the mark (to discard it) and then reads the out-of-band byte.

Example 2-2. Flushing Terminal I/O on Receipt of Out-of-Band Data

#include <stdio.h>
#include <termios.h>        /* POSIX-style */
#include <sys/ioctl.h>
#include <sys/socket.h>

oob()
{
    int  mark;
    char waste[BUFSIZ];
    /* Flush local terminal output */
    tcflush(1, TCOFLUSH);
    for (;;) {
        if (ioctl(rem, SIOCATMARK, &mark) < 0) {
            perror("ioctl");
            break;
        }
        if (mark)
            break;
        (void) read(rem, waste, sizeof (waste));
    }
    if (recv(rem, &mark, 1, MSG_OOB) < 0) {
        perror("recv");
        ...
    }
    ...
}

A process can also read the out-of-band data without first reading up to the mark. Reading the out-of-band data in this way is more difficult when the underlying protocol delivers the urgent data in-band with the normal data and only sends notification of its presence ahead of time (for example, the TCP protocol used to implement streams in the Internet domain). With such protocols, the out-of-band byte may not yet have arrived when a recv is done with the MSG_OOB flag. In that case, the call will return an error of EWOULDBLOCK. Worse, there may be so much in-band data in the input buffer that normal flow control prevents the sender from sending the urgent data until the buffer is cleared. The process must then read enough of the queued data for the urgent data to be delivered.

Certain programs that use multiple bytes of urgent data and must handle multiple urgent signals—for example, telnet (see telnet(1C))—need to retain the position of urgent data within the stream. This treatment is available as a sockets-level option, SO_OOBINLINE; see setsockopt(2) for usage. With this option, the position of urgent data (the “mark”) is retained, but the urgent data immediately follows the mark within the normal data stream returned without the MSG_OOB flag. Reception of multiple urgent indications causes the mark to move, but no out-of-band data is lost.

Nonblocking Sockets

Programs that cannot wait for a socket operation to be completed should use nonblocking sockets. I/O requests on nonblocking sockets return with an error if the request cannot be satisfied immediately.

Once a socket has been created with the socket() call, it can be marked as nonblocking by fcntl() as follows:

#include <fcntl.h>
 ...
int s;
 ...
s = socket(AF_INET, SOCK_STREAM, 0);
 ...
if (fcntl(s, F_SETFL, FNDELAY) < 0)  {
    perror("fcntl F_SETFL, FNDELAY");
    exit(1);
}
...

When performing nonblocking I/O on sockets, check for the error EAGAIN (stored in the global variable errno). This error occurs when an operation would normally block, but the socket it was performed on is nonblocking. In particular, accept(), connect(), send(), recv(), read(), and write() can all return EAGAIN, and processes should be prepared to deal with this return code.


Note: In previous releases of IRIX, the error EWOULDBLOCK was sometimes returned instead of EAGAIN. In the current release of IRIX, EWOULDBLOCK is defined as EAGAIN for source compatibility.

If an operation such as a send() cannot be completed, but partial writes are sensible (for example, when using a stream socket), data that can be sent immediately is processed, and the return value indicates the amount that was actually sent.

Interrupt-driven Sockets I/O

The SIGIO signal allows a process to be notified when a socket (or more generally, a file descriptor) has data waiting to be read. Use of the SIGIO facility requires three steps:

  1. The process must use a signal call to set up a SIGIO signal handler.

  2. The process must set the process ID or process group ID (see “Signals and Process Groups”) to receive notification of pending input either to its own process ID or to the group ID of its process group (the default process group of a socket is group zero). To do this, the process uses an fcntl call.

  3. The process uses another fcntl call to enable asynchronous notification of pending I/O requests. Example 2-3 shows sample code that enables a process to receive information on pending I/O requests as they occur for a socket s. With the addition of a handler for SIGURG, this code can be used to prepare for receipt of SIGURG signals.

    Example 2-3. Asynchronous Notification of I/O Requests

    #include <signal.h>
    #include <fcntl.h>
    
    ...
    int    io_handler();
    ...
    main()
    {
        signal(SIGIO, io_handler);
        
        /*Set the process receiving SIGIO/SIGURG signals to us*/
        
        if (fcntl(s, F_SETOWN, getpid()) < 0) {
            perror("fcntl F_SETOWN");
            exit(1);
        }
        
        /* Allow receipt of asynchronous I/O signals */
        if (fcntl(s, F_SETFL, FASYNC) < 0) {
            perror("fcntl F_SETFL, FASYNC");
            exit(1);
        }
    }
    io_handler()
    {
    ...
    }
    


Signals and Process Groups

Due to the existence of the SIGURG and SIGIO signals, each socket has an associated process number. This value is initialized to zero, but it can be redefined at a later time with the F_SETOWN fcntl, as was done in the previous code for SIGIO. To set the socket's process ID for signals, positive arguments should be given to the fcntl call. To set the socket's process group for signals, negative arguments should be passed to fcntl. Note that the process number indicates either the associated process ID or the associated process group; it is impossible to specify both at the same time. A similar fcntl, F_GETOWN, is available for determining the current process number of a socket.

Another useful signal you can use when constructing server processes is SIGCHLD, which is delivered to a process when any child processes have changed state. Normally, servers use SIGCHLD to “reap” child processes that have exited, without explicitly awaiting their termination or periodic polling for exit status. For example, the remote login server loop shown in “Connection-based Servers” can be augmented, as shown in Example 2-4.

Example 2-4. Using the SIGCHLD Signal

#include <signal.h>

int reaper();
...
main()
{
    ...
    signal(SIGCHLD, reaper);
    listen(f, 5);
    for (;;) {
        int g, len = sizeof (from);

        g = accept(f, (struct sockaddr *)&from, &len,);
        if (g < 0) {
            if (errno != EINTR) {
                syslog(LOG_ERR, "rlogind: accept: %m");
            }
            continue;
        }
        ...
       }
}
#include <sys/wait.h>
reaper()
{
    union wait status;

    while (wait3(&status, WNOHANG, 0) > 0) {
        ;    /* no-op */
    }
}

If the parent server process fails to reap its children, a large number of “zombie” processes can be created.

Pseudo-Terminals

Many programs will not function properly without a terminal for standard input and output. Since sockets do not provide the semantics of terminals, it is often necessary to have a process communicate over the network through a pseudo-terminal. A pseudo-terminal is actually a pair of devices, master and slave, that allows a process to serve as an active agent in communication between processes and users. Data written on the slave side of a pseudo–terminal is supplied as input to a process reading from the master side, while data written on the master side is processed as terminal input for the slave. In this way, the process manipulating the master side of the pseudo-terminal has control over the information read and written on the slave side, as if it were manipulating the keyboard and reading the screen on a real terminal. The purpose of this abstraction is to preserve terminal semantics over a network connection. The slave side appears as a normal terminal to any process reading from or writing to it.

For example, the remote login server uses pseudo-terminals for remote login sessions. A user logging in to a machine across the network gets a shell with a slave pseudo-terminal as standard input, output, and error. The server process then handles the communication between the programs invoked by the remote shell and the user's local client process. When a user sends a character that causes a remote machine to flush terminal output, the pseudo–terminal generates a control message for the server process. The server then sends an out-of-band message to the client process to signal a flush of data at the real terminal and on the intervening data buffered in the network.

Under IRIX, the name of the slave side of a pseudo-terminal has this syntax:

/dev/ttyqx

In this syntax, x is a number in the range 0 through 99. The master side of a pseudo-terminal is the generic device /dev/ptc.

Creating a pair of master and slave pseudo-terminals is straightforward. The master half of a pseudo-terminal is opened first. The slave side of the pseudo-terminal is then opened and is set to the proper terminal modes if necessary. The process then forks. The child closes the master side of the pseudo–terminal and execs the appropriate program. Meanwhile, the parent closes the slave side of the pseudo-terminal and begins reading and writing from the master side.

The sample code in Example 2-5 illustrates making use of pseudo–terminals. This code assumes that a connection on a socket s exists, connected to a peer that wants a service of some kind, and that the process has disassociated itself from any previously controlling terminal.

Example 2-5. Creating and Using a Pseudo-Terminal on IRIX

#include <sys/sysmacros.h>
#include <fcntl.h>
#include <syslog.h>

int master, slave;
struct stat stb;
char line[sizeof("/dev/ttyqxxx")];
master = open("/dev/ptc", O_RDWR | O_NDELAY);
if (master < 0 || fstat(master, &stb) < 0) {
    syslog(LOG_ERR, "All network ports in use");
    exit(1);
}
sprintf(line, "/dev/ttyq%d", minor(stb.st_rdev));
/* Put in separate process group, disassociate
   controlling terminal. */
setsid();

slave = open(line, O_RDWR);    /* Open slave side */
if (slave < 0) {
    syslog(LOG_ERR, "Cannot open slave pty %s", line);
    exit(1);
}
pid = fork();
if (pid < 0) {
    syslog(LOG_ERR, "fork: %m");
    exit(1);
}
if (pid > 0) {        /* Parent */
    close(slave);
    ...
} else {            /* Child */
    close(master);
    dup2(slave, 0);
    dup2(slave, 1);
    dup2(slave, 2);
    if (slave > 2)
        (void) close(slave);
    ...
}


Selecting Protocols

If the third argument to the socket() call is 0, socket() will select a default protocol to use with the returned socket of the type requested. The default protocol is usually correct, and alternate choices are not usually available. However, when using raw sockets to communicate directly with lower-level protocols or hardware interfaces, the protocol argument can be important for setting up de-multiplexing. For example, raw sockets in the Internet family can be used to implement a new protocol above IP, and the socket will receive packets only for the protocol specified.

To obtain a particular protocol, determine the protocol number as defined within the communication domain. For the Internet domain, you can use one of the library routines described in “Network Library Routines”. For example, you can use getprotobyname():

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
 ...
pp = getprotobyname("newtcp");
s = socket(AF_INET, SOCK_STREAM, pp->p_proto);

This call results in a socket s using a stream-based connection, but with a protocol type of newtcp instead of the default tcp.

Address Binding

Binding addresses to sockets in the Internet domain can be fairly complex. These associations are composed of local and foreign addresses, and local and foreign ports. Port numbers are allocated out of separate spaces, one for each system and one for each domain on that system.

Through the bind() system call, a process can specify half of an association, the <local address, local port> part, while the connect and accept calls are used to complete a socket's association by specifying the <foreign address, foreign port> part. Since the association is created in two steps, the association uniqueness requirement could be violated unless care is taken.

Furthermore, user programs do not always know the proper values to use for the local address and local port, since a host can reside on multiple networks and the set of allocated port numbers is not directly accessible to a user.

To simplify local address binding in the Internet domain, a wildcard address is provided. When an address is specified as INADDR_ANY (a manifest constant defined in <netinet/in.h>), the system interprets the address as “any valid address.”

For example, to bind a specific port number to a socket but leave the local address unspecified, the following code might be used:

#include <sys/types.h>
#include <netinet/in.h>
 ...
struct sockaddr_in sin;
 ...
s = socket(AF_INET, SOCK_STREAM, 0);
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl(INADDR_ANY);
sin.sin_port = htons(MYPORT);
bind(s, (struct sockaddr *) &sin, sizeof (sin));

Sockets with wildcarded local addresses can receive messages directed to the specified port number and sent to any of the possible addresses assigned to a host. For example, if a host has addresses 128.32.0.4 and 10.0.0.78, and a socket is bound as above, the process will be able to accept connection requests that are addressed to 128.32.0.4 or 10.0.0.78. For a server process to allow only hosts on a given network to connect to it, it would bind whichever of the server's addresses were on the appropriate network.

Similarly, a local port can be left unspecified (specified as zero), in which case the system selects an appropriate port number for it. For example, to bind a specific local address to a socket but leave the local port number unspecified, use this code:

hp = gethostbyname(hostname);
if (hp == NULL) {
   ...
}
bcopy(hp->h_addr, (char *) sin.sin_addr, hp->h_length);
sin.sin_port = htons(0);
bind(s, (struct sockaddr *) &sin, sizeof (sin));

The system selects the local port number based on two criteria:

  • On BSD systems, Internet ports between 512 and 1023 (IPPORT_RESERVED – 1) are reserved for privileged users; Internet ports above IPPORT_USERRESERVED (5000) are reserved for nonprivileged servers; and Internet ports between IPPORT_RESERVED and IPPORT_USERRESERVED are used by the system for assignment to clients.

  • The port number may not be bound to another socket.

To find a free Internet port number in the privileged range, the rresvport library routine can be used as follows to return a stream socket with a privileged port number:

int lport = IPPORT_RESERVED - 1;
int s;
 ...
s = rresvport(&lport);
if (s < 0) {
    if (errno == EAGAIN)
        fprintf(stderr, "socket: all ports in use");
    else
        perror("rresvport: socket");
    ...
}

The restriction on allocating ports allows processes executing in a “secure” environment to perform authentication based on the originating address and port number. For example, the rlogin command (see rlogin(1C)) lets users log in across a network without being asked for a password, under two conditions:

  • The name of the system the user is logging in from is in the file /etc/hosts.equiv on the system being logged in to (or the system name and the user name are in the user's .rhosts file in the user's home directory).

  • The user's rlogin process is coming from a privileged port on the machine from which the user is logging in.

The port number and network address of the machine the user is logging in from can be determined either by the from result of the accept() call or from the getpeername() call.

The algorithm used by the system to select port numbers can be unsuitable for an application, because the algorithm creates associations in a two–step process. For example, FTP specifies that data connections must always originate from the same local port. However, duplicate associations are avoided by connecting to different foreign ports. The system disallows binding the same local address and port number to a socket if a previous data connection's socket still exists. To override the default port selection algorithm, the following option call must be performed before address binding:

 ...
int on = 1;
 ...
setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
bind(s, (struct sockaddr *) &sin, sizeof (sin));

With this call, local addresses that are already in use can be bound. Binding local addresses does not violate the uniqueness requirement, because the system still checks at connect time to make sure that any other sockets with the same local address and port do not have the same foreign address and port. If the association already exists, the error EADDRINUSE is returned.

Socket Options

You can use the setsockopt() and getsockopt() system calls to set and get a number of options on sockets. These options include marking a socket for broadcasting, not routing, lingering on closing, and so on. In addition, you can specify protocol-specific options for IP and TCP, as described in ip(7P) and tcp(7P), and in “IP Multicasting”..

The general form of the setsockopt() and getsockopt() calls is:

setsockopt(s, level, optname, optval, optlen);
getsockopt(s, level, optname, optval, optlen);

The parameters have these meanings:

  • s is the socket on which the option is to be applied.

  • level specifies the protocol layer on which the option is to be applied; in most cases, level is the sockets level, indicated by the symbolic constant SOL_SOCKET, defined in <sys/socket.h>.

  • optname specifies the actual option, a symbolic constant that is also defined in <sys/socket.h>.

  • optval points to the value of the option (in most cases, whether the option is to be turned on or off).

  • optlen points to the length of the value of the option. For getsockopt, optlen is a value-result parameter, initially set to the size of the storage area pointed to by optval and modified upon return to indicate the actual amount of storage used.

For example, sometimes it's useful to determine the type (stream or datagram) of an existing socket. Programs under inetd (described in “The inetd Daemon”) may need to perform this task. You can do so via the SO_TYPE socket option and the getsockopt call, shown in this code:

#include <sys/types.h>
#include <sys/socket.h>

int type, size;
size = sizeof (int);
if (getsockopt(s, SOL_SOCKET, SO_TYPE,
    (char *) &type, &size) < 0) {
    perror("getsockopt");
    ...
}

After the getsockopt call, type will be set to the value of the socket type, as defined in <sys/socket.h>. For example, if the socket were a datagram socket, type would have the value corresponding to SOCK_DGRAM.

The inetd Daemon

When a single daemon listens for requests for many daemons, instead of having each daemon listen for its own requests, the number of idle daemons is reduced and the implementation of each daemon is simplified.

The inetd daemon handles three types of service:

  • A standard service, which has a well-known port assigned to it and is listed in /etc/services or the NIS services map—see services(4). It may be a service that implements an official Internet standard or is a BSD UNIX–specific service.

  • An RPC service, which uses the Sun RPC calls as the transport; such services are listed in /etc/rpc or the NIS rpc map—see rpc(4).

  • A TCPMUX service, which is nonstandard and does not have a well–known port assigned to it. TCPMUX services are invoked from inetd when a program connects to the tcpmux well-known port and specifies the service name. This is useful for adding locally developed servers.

The inetd daemon is invoked at boot time. It examines the file /usr/etc/inetd.conf to determine the servers it will listen for. Once this information has been read and a pristine environment created, inetd proceeds to create one socket for each service it is to listen for, binding the appropriate port number to each socket.

The inetd daemon performs a select() on these sockets for read() availability, waiting for a process to request a connection to the service corresponding to that socket. The inetd daemon then performs an accept() on the socket in question, fork()s, dup()s the new socket to file descriptors 0 and 1 (stdin and stdout), closes other open file descriptors, and execs the appropriate server.

Servers making use of inetd are considerably simplified, because inetd takes care of most of the IPC work required in establishing a connection. The server invoked by inetd expects the socket connected to its client on file descriptors 0 and 1, and can immediately perform any operations such as read(), write(), send(), or recv(). Servers can use buffered I/O as provided by the stdio conventions, as long as they use fflush() when appropriate. However, for server programs that handle multiple services or protocols, inetd allocates socket descriptors to protocols based on lexicographic order of service and protocol name.

For example, the RPC mount daemon, rpc.mountd, has two entries in inetd.conf for its TCP and UDP ports. When invoked by inetd, the TCP socket is on descriptor 0, and UDP is on 1.

When writing servers under inetd, you can use the getpeername call to return the address of the peer (process) connected on the other end of the socket. For example, to log a client's Internet address in “dot notation” (for example, 128.32.0.4), you might use the following code:

struct sockaddr_in name;
int namelen = sizeof (name);
 ...
if (getpeername(0, (struct sockaddr *)&name, &namelen) < 0) {
    syslog(LOG_ERR, "getpeername: %m");
    exit(1);
} else {
    syslog(LOG_INFO, "Connection from %s",
           inet_ntoa(name.sin_addr));
}

While the getpeername call is especially useful when writing programs to run with inetd, it can be used by stand-alone servers.

Standard TCP services are assigned unique well-known port numbers in the range of 0 to 255. These ports are of a limited number and are typically only assigned to official Internet protocols. The TCPMUX service, as described in RFC-1078, allows you to add locally developed protocols without needing an official TCP port assignment.

The protocol used by TCPMUX is simple: a TCP client connects to a foreign host on TCP port 1. It sends the service name followed by a carriage-return/ line-feed <Ctrl>-F. The server replies with a single character indicating positive (+) or negative (–) acknowledgment, immediately followed by an optional message of explanation, terminated with a <Ctrl>-F. If the reply was positive, the selected protocol begins; otherwise, the connection is closed. In the IRIX system, the TCPMUX service is built into inetd; that is, inetd listens on TCP port 1 for requests for TCPMUX services listed in inetd.conf.

The following code is an example TCPMUX server and its inetd.conf entry:

#include <sys/types.h>
#include <stdio.h>

main()
{
    time_t t;
    printf("+Go\r\n");
    fflush(stdout);
    time(&t);
    printf("%d = %s", t, ctime(&t));
    fflush(stdout);
}

More sophisticated servers may want to do additional processing before returning the positive or negative acknowledgment.

The inetd.conf entry is:

tcpmux/current_time stream tcp nowait guest /d/curtime curtime

The following portion of the client code handles the TCPMUX handshake:

char line[BUFSIZ];
FILE *fp;
 ...
/* Use stdio for reading data from the server */
fp = fdopen(sock, "r");
if (fp == NULL) {
    fprintf(stderr, "Can't create file pointer\n");
    exit(1);
}
/* Send service request */
sprintf(line, "%s\r\n", "current_time");
if (write(sock, line, strlen(line)) < 0) {
    perror("write");
    exit(1);
}

/* Get ACK/NAK response from the server */
if (fgets(line, sizeof(line), fp) == NULL) {
    if (feof(fp)) {
        die();
    } else {
        fprintf(stderr, "Error reading response\n");
        exit(1);
    }
}
/* Delete <CR> */
if ((lp = index(line, '\r')) != NULL) {
    *lp = ' ';
}

switch (line[0]) {
    case '+':
        printf("Got ACK: %s\n", &line[1]);
        break;
    case '-':
        printf("Got NAK: %s\n", &line[1]);
        exit(0);
    default:
        printf("Got unknown response: %s\n", line);
        exit(1);
}
/* Get rest of data from the server */
while ((fgets(line, sizeof(line), fp)) != NULL) {
    fputs(line, stdout);
}

Broadcasting

Using a datagram socket, you can send broadcast packets on many networks supported by the system. The network itself must support broadcast; the system provides no simulation of broadcast in software. Broadcast messages can place a high load on a network, since they force every host on the network to service them. Consequently, the ability to send broadcast packets has been limited to sockets explicitly marked to allow broadcasting. Broadcast is typically used for one of two reasons: to find a resource on a local network without prior knowledge of its address or to send information to all accessible neighbors.


Note: Multicasting is an alternative to broadcasting. See “IP Multicasting” for information about setting up multicast sockets.

To send a broadcast message, create a datagram socket:

s = socket(AF_INET, SOCK_DGRAM, 0);

Mark the socket to allow broadcasting:

int on = 1;
setsockopt(s, SOL_SOCKET, SO_BROADCAST, &on, sizeof (on));

Bind a port number to the socket:

sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl(INADDR_ANY);
sin.sin_port = htons(MYPORT);
bind(s, (struct sockaddr *) &sin, sizeof (sin));

The destination address of the broadcast message depends on the network(s). The Internet domain supports a shorthand notation for broadcast on the local network, the address INADDR_BROADCAST (defined in <netinet/in.h>).

Determining the list of addresses for all reachable neighbors requires knowledge of the networks to which the host is connected. Since this information should be obtained in a host-independent fashion and may be impossible to derive, IRIX provides a method for retrieving this information from the system data structures.

The SIOCGIFCONF ioctl call returns the interface configuration of a host in the form of a single ifconf structure. This structure contains a data area that is made up of an array of ifreq structures, one for each network interface to which the host is connected.

These structures are defined in <net/if.h>, as shown in this example:

struct ifconf {
    ifc_len    /* size of associated buffer */
    union {
        caddr_t  ifcu_buf;
        struct   ifreq *ifcu_req;
    } ifc_ifcu;
};

/* Buffer address */
#define ifc_buf   ifc_ifcu.ifcu_buf

/* Array of structures returned */
#define ifc_req   ifc_ifcu.ifcu_req

#define IFNAMSIZ        16
struct  ifreq {

        /* Interface name, e.g. "en0" */
        char    ifr_name[IFNAMSIZ];             
        union {
                struct  sockaddr ifru_addr;
                struct  sockaddr ifru_dstaddr;
                struct  sockaddr ifru_broadaddr;
                short   ifru_flags;
                int     ifru_metric;
                /* MIPS ABI - unused by BSD */
                char    ifru_data[1];    
                char    ifru_enaddr[6];         /* MIPS ABI */
                char    ifru_oname[IFNAMSIZ];   /* MIPS ABI */
                struct  ifstats ifru_stats;

                /* Trusted IRIX */
                struct {
                        caddr_t ifruv_base;
                        int     ifruv_len;
                }       ifru_vec;
        } ifr_ifru;
};

/* Address */
#define ifr_addr        ifr_ifru.ifru_addr      

/* Other end of p-to-p link */
#define ifr_dstaddr     ifr_ifru.ifru_dstaddr   

/* Broadcast address */
#define ifr_broadaddr   ifr_ifru.ifru_broadaddr 

/* Flags */
#define ifr_flags       ifr_ifru.ifru_flags     

/* Metric */
#define ifr_metric      ifr_ifru.ifru_metric    

/* For use by interface */
#define ifr_data        ifr_ifru.ifru_data      

/* Ethernet address */
#define ifr_enaddr      ifr_ifru.ifru_enaddr    

/* Other interface name */
#define ifr_oname       ifr_ifru.ifru_oname     

/* Statistics */
#define ifr_stats       ifr_ifru.ifru_stats     

/* Trusted IRIX */
#define ifr_base        ifr_ifru.ifru_vec.ifruv_base
#define ifr_len         ifr_ifru.ifru_vec.ifruv_len

The following call obtains the interface configuration:

struct ifconf ifc;
char buf[BUFSIZ];

ifc.ifc_len = sizeof (buf);
ifc.ifc_buf = buf;
if (ioctl(s, SIOCGIFCONF, (char *) &ifc) < 0) {
    ...
}

After this call, buf will contain one ifreq structure for each network to which the host is connected, and ifc.ifc_len will have been modified to reflect the number of bytes used by the ifreq structure.

Each structure has an associated set of interface flags that tell whether the network corresponding to that interface is up or down, point-to-point or broadcast, and so on. The SIOCGIFFLAGS ioctl retrieves these flags for an interface specified by an ifreq structure:

struct ifreq *ifr;
struct sockaddr dst;

ifr = ifc.ifc_req;
for (n = ifc.ifc_len / sizeof (struct ifreq); --n >= 0;
     ifr++) {
    
    /* Be careful not to use an interface devoted to an
     * address family other than the one intended */
    if (ifr->ifr_addr.sa_family != AF_INET)
        continue;
    if (ioctl(s, SIOCGIFFLAGS, (char *) ifr) < 0) {
        ...
    }
    /*
     * Skip boring cases.
     */
    if ((ifr->ifr_flags & IFF_UP) == 0 ||
        (ifr->ifr_flags & IFF_LOOPBACK) ||
        (ifr->ifr_flags &
         (IFF_BROADCAST | IFF_POINTTOPOINT)) == 0) {
            continue;
    }

Once you retrieve the flags, retrieve the broadcast address. For broadcast networks, retrieval is done via the SIOCGIFBRDADDR ioctl. For point–to–point networks, the address of the destination host is obtained with SIOCGIFDSTADDR:

if (ifr->ifr_flags & IFF_POINTTOPOINT) {
    if (ioctl(s, SIOCGIFDSTADDR, (char *) ifr) < 0) {
        ...
    }
    bcopy((char *) ifr->ifr_dstaddr, (char *) &dst,
          sizeof (ifr->ifr_dstaddr));
    
} else if (ifr->ifr_flags & IFF_BROADCAST) {
    if (ioctl(s, SIOCGIFBRDADDR, (char *) ifr) < 0) {
        ...
    }
    bcopy((char *) ifr->ifr_broadaddr, (char *) &dst,
          sizeof (ifr->ifr_broadaddr));
}

After the appropriate ioctls get the broadcast or destination address (now in dst), use the sendto() call:

    sendto(s, buf, buflen, 0, (struct sockaddr *)&dst,
           sizeof (dst));

In the above loop, one sendto() occurs for every interface the host is connected to that supports broadcast or point-to-point addressing. For a process to send only broadcast messages on a given network, use code similar to that outlined above, but the loop needs to find the correct destination address.

Received broadcast messages contain the sender's address and port, since datagram sockets are bound before a message is allowed to go out.

IP Multicasting

The following quote is attributed to Request For Comments 1112, "Host Extensions for IP Multicasting," S.Deering August 1989:

Multicasting is the transmission of an IP datagram to a host group, a set of zero or more hosts identified by a single IP destination address. A multicast datagram is delivered to all members of its destination host group with the same best-efforts reliability as regular unicast IP datagrams; that is, the datagram is not guaranteed to arrive intact at all members of the destination group or in the same order relative to other datagrams.

The membership of a host group is dynamic; that is, hosts may join and leave groups at any time. There is no restriction on the location or number of members in a host group. A host may be a member of more than one group at a time. A host need not be a member of a group to send datagrams to it.

A host group may be permanent or transient. A permanent group has a well-known, administratively assigned IP address. It is the address, not the membership of the group, that is permanent; at any time a permanent group may have any number of members, even zero. Those IP multicast addresses that are not reserved for permanent groups are available for dynamic assignment to transient groups, which exist as long as they have members.

In general, a host cannot assume that datagrams sent to any host group address will reach only the intended hosts, or that datagrams received as a member of a transient host group are intended for the recipient. Misdirected delivery must be detected at a level above IP, using higher-level identifiers or authentication tokens. Information transmitted to a host group address should be encrypted or governed by administrative routing controls if the sender is concerned about unwanted listeners.


Note: This RFC-1112 level-2 implementation of IP multicasting is experimental and subject to change in order to track future BSD UNIX releases. In particular, there may be changes in the way a process overrides the default interface for sending multicast datagrams and for joining multicast groups. This ability to override the default interface is intended mainly for routing daemons; normal applications should not be concerned with specific interfaces.

IP multicasting is currently supported only on AF_INET sockets of type SOCK_DGRAM and SOCK_RAW, and only on subnetworks for which the interface driver has been modified to support multicasting. The standard Ethernet, FDDI, and SLIP interfaces on the IRIS support multicasting. (Older versions of ENP-10 Ethernet interfaces may require an upgrade; see the IRIX Admin manual set for details.)

The next sections describe how to send and receive multicast datagrams.

Sending IP Multicast Datagrams

To send a multicast datagram, specify an IP multicast address in the range 224.0.0.0 to 239.255.255.255 as the destination address in a sendto() call.

The definitions required for the multicast-related socket options are found in <netinet/in.h>. All IP addresses are passed in network byte order.

By default, IP multicast datagrams are sent with a time-to-live (TTL) of 1, which prevents them from being forwarded beyond a single subnetwork. A new socket option allows the TTL for subsequent multicast datagrams to be set to any value from 0 to 255, in order to control the scope of the multicasts:

u_char ttl;
setsockopt(sock, IPPROTO_IP, IP_MULTICAST_TTL, &ttl,
           sizeof(ttl));

Multicast datagrams with a TTL of 0 will not be transmitted on any subnetwork but may be delivered locally if the sending host belongs to the destination group and if multicast loopback has not been disabled on the sending socket. Multicast datagrams with a TTL greater than 1 may be delivered to more than one subnetwork if there is at least one multicast router attached to the first-hop subnetwork. To provide meaningful scope control, the multicast routers support the notion of TTL thresholds, which prevent datagrams with less than a certain TTL from traversing certain subnetworks.

The thresholds enforce the convention shown in Table 2-3.

Table 2-3. TTL Threshold Convention

Scope

Initial TTL

Restricted to the same host

0

Restricted to the same subnetwork

1

Restricted to the same site

32

Restricted to the same region

64

Restricted to the same continent

128

Unrestricted

255

“Sites” and “regions” are not strictly defined, and sites may be further subdivided into smaller administrative units, as a local matter.

An application may choose an initial TTL other than one listed in Table 2-3. For example, an application might perform an expanding-ring search for a network resource by sending a multicast query, first with a TTL of 0, and then with larger and larger TTLs, until a reply is received, perhaps using the TTL sequence 0, 1, 2, 4, 8, 16, 32.

The multicast router mrouted (see mrouted(1M)) refuses to forward any multicast datagram with a destination address between 224.0.0.0 and 224.0.0.255, inclusive, regardless of its TTL. This range of addresses is reserved for the use of routing protocols and other low-level topology discovery or maintenance protocols, such as gateway discovery and group membership reporting.

The address 224.0.0.0 is guaranteed not to be assigned to any group, and 224.0.0.1 is assigned to the permanent group of all IP hosts (including gateways). This assignment convention is used to address all multicast hosts on the directly connected network. There is no multicast address (or any other IP address) for all hosts on the total Internet. The addresses of other well-known, permanent groups are published in the “Assigned Numbers” RFC (Internet Request for Comment 1060).

Each multicast transmission is sent from a single network interface, even if the host has more than one multicast-capable interface. (If the host is also serving as a multicast router, a multicast may be forwarded to interfaces other than the originating interface, provided that the TTL is greater than 1.) The default interface to be used for multicasting is the primary network interface on the system. A socket option is available to override the default for subsequent transmissions from a given socket:

struct in_addr addr;
setsockopt(sock, IPPROTO_IP, IP_MULTICAST_IF, &addr,
           sizeof(addr));

where addr is the local IP address of the desired outgoing interface. An address of INADDR_ANY may be used to revert to the default interface. The local IP address of an interface can be obtained via the SIOCGIFCONF ioctl. To determine if an interface supports multicasting, fetch the interface flags via the SIOCGIFFLAGS ioctl and see if the IFF_MULTICAST flag is set. (Normal applications should not need to use this option; it is intended primarily for multicast routers and other system services specifically concerned with Internet topology.) The SIOCGIFCONF and SIOCGIFFLAGS ioctls are described in “Broadcasting”..

If a multicast datagram is sent to a group to which the sending host itself belongs (on the outgoing interface), a copy of the datagram is, by default, looped back by the IP layer for local delivery. Another socket option gives the sender explicit control over whether or not subsequent datagrams are looped back:

u_char loop;
setsockopt(sock, IPPROTO_IP, IP_MULTICAST_LOOP, &loop,
           sizeof(loop));

In this example, loop is set to 0 to disable loopback, and set to 1 to enable loopback. This option improves performance for applications that may have no more than one instance on a single host (such as a router daemon) by eliminating the overhead of receiving their own transmissions. In general, loop should not be used by applications for which there may be more than one instance on a single host (such as a conferencing program) or for which the sender does not belong to the destination group (such as a time-querying program).

A multicast datagram sent with an initial TTL greater than 1 may be delivered to the sending host on a different interface from that on which it was sent if the host belongs to the destination group on that other interface. The loopback control option has no effect on such delivery.

Receiving IP Multicast Datagrams

Before a host can receive IP multicast datagrams, it must become a member of one or more IP multicast groups. A process can ask the host to join a multicast group by using this socket option:

struct ip_mreq mreq;
setsockopt(sock, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq,
           sizeof(mreq))

mreq is defined in this structure:

struct ip_mreq {
    struct in_addr imr_multiaddr; /*multicast group to join*/
    struct in_addr imr_interface; /*interface to join on*/
}

Every membership is associated with a single interface, and it is possible to join the same group on more than one interface. imr_interface should be INADDR_ANY to choose the default multicast interface or one of the host's local addresses to choose a particular (multicast-capable) interface. Up to IP_MAX_MEMBERSHIPS (currently 20) memberships may be added on a single socket.

To drop a membership, use

struct ip_mreq mreq;
setsockopt(sock, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq,
           sizeof(mreq));

where mreq contains the same values as used to add the membership. The memberships associated with a socket are also dropped when the socket is closed or the process holding the socket is killed. However, more than one socket may claim a membership in a particular group, and the host will remain a member of that group until the last claim is dropped.

The memberships associated with a socket do not necessarily determine which datagrams are received on that socket. Incoming multicast packets are accepted by the kernel IP layer if any socket has claimed a membership in the destination group of the datagram; however, delivery of a multicast datagram to a particular socket is based on the destination port (or protocol type for raw sockets), just as with unicast datagrams. To receive multicast datagrams sent to a particular port, it is necessary to bind to that local port, leaving the local address unspecified (that is, INADDR_ANY).

More than one process may bind to the same SOCK_DGRAM UDP port if the bind() call is preceded by:

int on = 1;
setsockopt(sock, SOL_SOCKET, SO_REUSEPORT, &on, sizeof(on));

In this case, every incoming multicast or broadcast UDP datagram destined to the shared port is delivered to all sockets bound to the port. For backward compatibility reasons, this does not apply to incoming unicast datagrams. Unicast datagrams are never delivered to more than one socket, regardless of how many sockets are bound to the datagram's destination port. SOCK_RAW sockets do not require the SO_REUSEPORT option to share a single IP protocol type.


Note: A final multicast-related extension is independent of IP: two new ioctls, SIOCADDMULTI and SIOCDELMULTI, are available to add or delete link-level (for example, Ethernet) multicast addresses accepted by a particular interface. The address to be added or deleted is passed as a sockaddr structure of family AF_UNSPEC, within the standard ifreq structure.

These ioctls are used for protocols other than IP and require superuser privileges. A link-level multicast address added via SIOCADDMULTI is not automatically deleted when the socket used to add it goes away; it must be explicitly deleted. It is inadvisable to delete a link-level address that may be in use by IP.

Sample Multicast Program

The following program sends or receives multicast packets. If invoked with one argument, it sends a packet containing the current time to an arbitrarily chosen multicast group and UDP port. If invoked with no arguments, it receives and prints these packets. Start it as a sender on just one host and as a receiver on all the other hosts.

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <time.h>
#include <stdio.h>

#define EXAMPLE_PORT    6000
#define EXAMPLE_GROUP    "224.0.0.250"

main(argc)
    int argc;
{
    struct sockaddr_in addr;
    int    addrlen, fd, cnt;
    struct ip_mreq mreq;
    char message[50];

    fd = socket(AF_INET, SOCK_DGRAM, 0);
    if (fd < 0) {
        perror("socket");
        exit(1);
    }
    bzero(&addr, sizeof(addr));
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_ANY);
    addr.sin_port = htons(EXAMPLE_PORT);
    addrlen = sizeof(addr);
        if (argc > 1) {    /* Send */
        addr.sin_addr.s_addr = inet_addr(EXAMPLE_GROUP);
        while (1) {
            time_t t = time(0);
            sprintf(message, "time is %-24.24s", ctime(&t));
            cnt = sendto(fd, message, sizeof(message), 0,
                         &addr, addrlen);
            if (cnt < 0) {
                perror("sendto");
                exit(1);
            }
            sleep(5);
        }
    } else {        /* Receive */
        if (bind(fd, &addr, sizeof(addr)) < 0) {
            perror("bind");
            exit(1);
        }
        mreq.imr_multiaddr.s_addr = inet_addr(EXAMPLE_GROUP);
        mreq.imr_interface.s_addr = htonl(INADDR_ANY);
        if (setsockopt(fd, IPPROTO_IP, IP_ADD_MEMBERSHIP,
            &mreq, sizeof(mreq)) < 0) {
            perror("setsockopt mreq");
            exit(1);
        }
        while (1) {
            cnt = recvfrom(fd, message, sizeof(message), 0,
                           &addr, &addrlen);
                if (cnt < 0) {
                perror("recvfrom");
                exit(1);
                } else if (cnt == 0) {
                break;
            }
            printf("%s: message = \"%s\"\n",
                   inet_ntoa(addr.sin_addr), message);
        }
    }
}