Chapter 1. Understanding ONC3/NFS

This chapter introduces the SGI implementation of the Sun Microsystems Open Network Computing Plus (ONC+) distributed services, which was previously referred to as Network filesystem (NFS). In this guide, NFS refers to the distributed network filesystem in ONC3/NFS.

The information in this chapter is prerequisite to successful ONC3/NFS administration. It defines ONC3/NFS and its relationship to other network software, introduces the ONC3/NFS vocabulary, and identifies the software elements that support ONC3/NFS operation. It also explains special utilities and implementation features of ONC3/NFS.

This chapter contains these sections:

Overview of ONC3/NFS

ONC3/NFS is the SGI implementation of ONC+ distributed services. ONC3/NFS is optimized for SGI systems and integrated with the IRIX Interactive Desktop environment and system toolchest. ONC3/NFS can run only on an SGI system.

ONC3/NFS is made up of distributed services that allow users to access filesystems and directories on remote systems and treat them as if they were local. Networks with heterogeneous architectures and operating systems can participate in the same ONC3/NFS service. The service can also include systems connected to different types of networks.

ONC3/NFS is a separate software product, and must be installed on both server and client. However, you should be familiar with the information in this chapter before setting up or modifying the ONC3/NFS environment.

ONC3/NFS Components

This section summarizes the components of ONC3/NFS and is followed by expanded notes.

NFS 

The Network filesystem (NFS) is the distributed network filesystem in ONC3/NFS and contains server and client components to enable access to remote files. ONC3/NFS supports NFS version 3 (NFS3) and NFS version 2, but uses NFS3 by default. NFS is multithreaded to take advantage of multiprocessor performance.For more about NFS, see “About NFS”.

NIS 

The network information service (NIS) is a database of network entity location information that can be used by NFS. NIS is implemented as part of the Unified Name Service (UNS). Information about NIS and UNS is published in a separate volume called the NIS Administrator's Guide. For more about the interaction of NFS with NIS, see “About NFS and the Network Information Service”.

AutoFS 

The AutoFS filesystem (AutoFS), introduced in IRIX 6.2, is an implementation of the automatic mounter that uses the autofs command instead of automount. Like automount, autofs provides automatic and transparent NFS mounts upon access of specified AutoFS filesystems. autofs mainly differs from automount by providing multithreaded service , by providing in-place mounts, and by using the LoFS (loopback filesystem) to access local filesystems. autofs is multithreaded; it accepts dynamic configuration updates. Unlike automount, autofs access cannot be blocked by a server that is down or responding slowly. One thread may block, but this does not prevent other references through autofs from completing. autofs and automount cannot exist on the same system. By default, autofs is enabled upon installation, although automount can be selected by using chkconfig. Further information about AutoFS, refer to “About the AutoFS Filesystem”.

CacheFS 

The Cache filesystem (CacheFS), introduced in IRIX 5.3, provides client-side caching for NFS and other filesystem types. Using CacheFS on NFS clients with local disk space can significantly increase the number of clients a server can support and reduce the data access time for clients using read-only filesystems. For more about CacheFS, refer to “About CacheFS Filesystem”.

Bulk Data Service 

The SGI implementation of the Bulk Data Service protocol, BDSpro, is available as an option for NFS. BDSpro is an extension to NFS for handling large file transactions over high-speed networks. BDSpro exploits the data access speed of the XFS filesystem and data transfer rates of network media, such as HIPPI and Fibre Channel, to accelerate standard NFS performance. The BDS protocol modifies NFS functions to reduce the time needed to transfer files of 100 megabytes or larger over a network connection. For more information about BDSpro, refer to Getting Started With BDSpro.

About NFS

NFS is a network service that allows users to access file hierarchies across a network in such a way that they appear to be local. File hierarchies can be entire filesystems or individual directories. Systems participating in the NFS service can be heterogeneous. They may be manufactured by different vendors, use different operating systems, and be connected to networks with different architectures. These differences are transparent to the NFS application.

NFS is an application layer service that can be used on a network running the User Datagram Protocol (UDP) or Transmission Control Protocol (TCP). The UDP protocol has traditionally been used as the transport layer protocol. UDP supports connectionless transmissions and stateless protocol, providing robust service. The TCP protocol supports connection-based transmissions that are beneficial in WAN configurations. TCP provides high reliability and, with its sophisticated packet tracking scheme, reduces client and server input buffer overflow and multiple packet resends.

NFS relies on remote procedure calls (RPC) for session layer services and external data representation (XDR) for presentation layer services. XDR is a library of routines that translate data formats between processes.

Figure 1-1 illustrates the NFS software implementation in the context of the Open Systems Interconnect (OSI) model.

Figure 1-1. NFS Software Implementation

NFS Software Implementation

NFS and Diskless Workstations

It is possible to set up a system so that all the required software, including the operating system, is supplied from remote systems by means of the NFS service. Workstations operating in this manner are considered diskless workstations, even though they may be equipped with a local disk.

Instructions for implementing diskless workstations are given in the Diskless Workstation Administration Guide. However, it is important to acquire a working knowledge of NFS before setting up a diskless system.

About NFS and the Network Information Service

The network information service ( NIS) is a database service that provides location information about network entities to other network servers and applications, such as NFS. NFS and NIS are independent services that may or may not be operating together on a given network. On networks running NIS, NFS may use the NIS databases to locate systems when NIS queries are specified.

About UNS and NIS

The Unified Name Service (UNS) provides a system-wide interface to hostname, password, and many other lookups. It controls the resolution of hostnames used by AutoFS and automount. Both AutoFS and automount bypass UNS when using information from NIS.

About the AutoFS Filesystem

AutoFS is the kernel virtual filesystem that supports automatic mounting of filesystems. Together with the implementation of autofsd (the autofs daemon), AutoFS solves several fundamental problems with the earlier implementation of the automount daemon:

  • The symbolic links and the /tmp_mnt prepended to paths are replaced by in-place mounting.

  • AutoFS is filesystem independent.

By default, AutoFS tries NFS version 3 first, and if the server does not support version 3, AutoFS retries the mount using NFS2.

Without symbolic links, indirection to mount points is now performed entirely within the kernel, improving performance. autofsd is now a stateless daemon, responsible for performing automatic mounts and unmounts. It allows mount points to be added or deleted without rebooting. The daemon is not required to access a filesystem once it is mounted.

In addition, autofsd can mount filesystems besides NFS, such as removable-media filesystems. These improvements are compatible with previously existing maps and administrative procedures.

Simplified autofs Operation

The automatic mounting daemon, autofsd  ,starts at boot time from the /etc/init.d/network script because, by default, autofs and nfs are turned on using the chkconfig command. The /etc/init.d/network script also runs the autofs command, which reads a master map and installs AutoFS mount points.

Unlike mount, autofs does not read the file /etc/fstab, which is specific to each workstation, for a list of filesystems to mount. Rather, autofs is controlled within a domain (and on particular workstations) through the maps, saving a great deal of administrator time.

How autofs Navigates Through the Network (Maps)

The autofs command searches a series of maps to navigate its way through the network. Maps are files that contain information mapping local directories or mount points to remote server filesystems. A special map is supported by AutoFS to provide a convenient way of accessing all host machines on the network. To acces this map, use the -hosts option with the autofs command. Maps are available locally or through a network name service like NIS or NIS+. You create maps to meet the needs of your users' environment. See “NFS Automatic Mounting”, and Chapter 3, “Using Automatic Mounter Map Options” for detailed information on automatic mounting and its maps.

About CacheFS Filesystem

A cache is a temporary storage area for data. With the cache filesystem (CacheFS), you can store frequently used data from a remote filesystem or CD-ROM on the local disk drive of a workstation. The data stored on the local disk is the cache.

When a filesystem is cached, the data is read from the original filesystem and stored on the local disk. The reduction in network traffic improves performance. If the remote filesystem is on a storage medium with slower response time than the local disk (such as a CD–ROM), caching provides an additional performance gain.

CacheFS can use all or part of a local disk to store data from one or more remote filesystems. A user accessing a file does not need to know whether the file is stored in a cache or is being read from the original filesystem. The user opens, reads, and writes files as usual.

A cache with default parameters can be created by using the mount command. Default parameters can be changed by using the cfsadmin command. See “Cached File System Administration” in Chapter 2 and “Cache Resource Parameters in CacheFS” in Chapter 2. Specific details of CacheFS are discussed in “Planning a CacheFS File System” in Chapter 2.

Client-Server Fundamentals

In an NFS transaction, the workstation requesting access to remote directories is known as the client. The workstation providing access to its local directories is known as the server. A workstation can function as a client and a server simultaneously. It can allow remote access to its local filesystems while accessing remote directories with NFS. The client-server relationship is established by two complementary processes: exporting and mounting.

Exporting NFS Filesystems

Exporting is the process by which an NFS server provides access to its file resources to remote clients. Individual directories, as well as filesystems, can be exported, but exported entities are usually referred to as filesystems. Exporting is done either during the server's boot sequence or from a command line as superuser while the server is running.

After a filesystem is exported, any authorized client can use it. A list of exported filesystems, client authorizations, and other export options are specified in the /etc/exports file (see “Operation of /etc/exports and Other Export Files” in Chapter 2 for details). Exported filesystems are removed from NFS service by a process known as unexporting.

A server can export any filesystem or directory that is local. However, it cannot export both a parent and child directory within the same filesystem; to do so is redundant.

For example, assume that the filesystem /usr contains the directory /usr/demos. As the child of /usr, /usr/demos is automatically exported with /usr. For this reason, attempting to export both /usr and /usr/demos generates an error message that the parent directory is already exported. If /usr and /usr/demos were separate filesystems, this example would be valid.

When exporting hierarchically related filesystems, such as /usr and /usr/demos in the previous example, we recommend the use of the -nohide option to reduce the number of mounts required by clients (see the exports(4) man page ).

Mounting NFS Filesystems

Mounting is the process by which filesystems, including NFS filesystems, are made available to the IRIX operating system and consequently, the user. When NFS filesystems or directories are mounted, they are made available to the client over the network by a series of remote procedure calls that enable the client to access the filesystem transparently from the server's disk. Mounted NFS directories or filesystems are not physically present on the client system, but the mount looks like a local mount and users enter commands as if the filesystems were local.

NFS clients can have directories mounted from several servers simultaneously. Mounting can be done as part of the client's boot sequence, automatically, at filesystem access, with the help of a user-level daemon, or with a superuser command after the client is running. When mounted directories are no longer needed, they can be relinquished in a process known as unmounting.

Like locally mounted filesystems, NFS-mounted filesystems and directories can be specified in the /etc/fstab file (see “Operation of /etc/fstab and Other Mount Files” in Chapter 2 for details). Since NFS filesystems are located on remote systems, specifications for NFS mounted resources must include the name of the system where they reside.

NFS Mount Points

The access point in the client filesystem where an NFS directory is attached is known as a mount point. A mount point is specified by a conventional IRIX pathname.

Figure 1-2 illustrates the effect of mounting directories onto mount points on an NFS client.

Figure 1-2. Sample Mounted Directory

Sample Mounted Directory

The pathname of a filesystem on a server can be different from its mount point on the client. For example, in Figure 1-2 the filesystem /usr/demos is mounted in the client's filesystem at mount point /n/demos. Users on the client gain access to the mounted directory with server using the '/n/demos' pathname.

NFS Mount Restrictions

NFS does not permit multihopping, mounting a directory that is itself NFS mounted on the server. For example, if host1 mounts /usr/demos from host2, host3 cannot mount /usr/demos from host1. This would constitute a multihop.

NFS also does not permit loopback mounting, mounting a directory that is local to the client via NFS. For example, the local filesystem /usr on host1 cannot be NFS mounted to host1, this would constitute a loopback mount.

NFS Automatic Mounting

As an alternative to standard mounting using /etc/fstab or the mount command, NFS provides two automatic mounting utilities. The original automatic mounter, called automount and a newer implementation introduced in IRIX 6.2, called autofs. Both automatic mounters dynamically mount filesystems when they are referenced by any user on the client system, then unmount them after a specified time interval. Unlike standard mounting, automount and autofs, once set up, do not require superuser privileges to mount a remote directory. They also create the mount points needed to access the mounted resource. NFS servers cannot distinguish between directories mounted by the automatic mounters and those mounted by conventional mount procedures. autofs and automount cannot co-exist on the same system.

Unlike the standard mount process, automount and autofs do not read the /etc/fstab file for mount specifications. Instead, they read alternative files (either local or through NIS), known as maps, for mounting information (see “Operation of Automatic Mounter Files and Maps” in Chapter 2 for details). They also provide special maps for accessing remote systems and automatically reflecting changes in the /etc/hosts file and any changes to the remote server's /etc/exports file.

Default configuration information for automatic mounting is contained in the files /etc/config/automount.options (for automount) and /etc/config/autofs.options (for autofs). These files can be modified to use different options and more sophisticated maps.

NFS Protocol

NFS protocol is stateless, that is, the server maintains almost no information about NFS clients. The stateless nature of the protocol insulates clients and servers from the effects of failures. If a server fails, the only effect on clients is that NFS data on the server is unavailable to clients. If a client fails, server performance is not affected. Clients are independently responsible for completing NFS transactions if the server or network fails. By default, when a failure occurs, NFS clients continue attempting to complete the NFS operation until the server or network recovers. To the client, the failure can appear as slow performance on the part of the server. Client applications continue retransmitting until service is restored and their NFS operations can be completed. If client fails, no action is needed by the server or its administrator in order for the server to continue operation.

NFS can use either connection-less UDP protocol or connection-oriented TCP protocol to transmit data between the client and the server.

The UDP provides low overhead protocol for transmiting packets. This makes it a good candidate for the NFS transport protocol on reliable local networks where packet loss is minimal. UDP was the default protocol used by NFS up until the IRIX 6.5.23 release.

The TCP provides a highly efficient method for transmitting packets, especially in large wide-area networks or busy local networks. With the TCP protocol, a connection is made between the client and the server and all packets are labeled and tracked. Even though this tracking is more CPU-intensive, the larger block transfer size, congestion control, and automatic retransmition handling makes TCP almost as efficient as UDP when used as NFS transport protocol. Starting with the IRIX 6.5.24 release, TCP is used as the default protocol for NFS.

NFS Input/Output Management

In NFS2 transactions, data input and output is asynchronous read-ahead and write-behind, unless otherwise specified. As the server receives data, it notifies the client that the data was successfully written. The client responds by freeing the blocks of NFS data successfully transmitted to the server. In reality, however, the server might not write the data to disk before notifying the client, a technique called delayed writes. Writes are done when they are convenient for the server, but at least every 30 seconds. NFS2 uses delayed writes by default.

With synchronous writes, the server writes the data to disk before notifying the client that it has been written. Synchronous writes are supported as an option in NFS2 (see “/etc/exports Options” in Chapter 2 for details of NFS options), and in NFS3. Synchronous writes may slow NFS performance due to the time required for disk access, but increase data integrity in the event of system or network failure.

In the IRIX 6.5.24 release, support for direct I/O was added for NFS3. By opening files with O_DIRECT flag in the arguments to the open(2) system call, client applications can avoid data caching on the client and force all I/O to be carried directly from the application's buffers provided that certain requirements for buffer size and alignment are met.

NFS File Locking Service

To help manage file access conflicts and protect NFS sessions during failures, NFS offers a file and record locking service called the network lock manager. The network lock manager is a separate service NFS makes available to user applications. To use the locking service, applications must make calls to standard IRIX lock routines (see the man pages fcntl(2) , flock(3B) , and lockf(3C) ). For NFS files, these calls are sent to the network lock manager process (see lockd(1M) man page ) on the server.

The network lock manager processes must run on both client and server. Communication between the two processes is by means of RPC. Calls issued to the client process are handed to the server process, which uses its local IRIX locking utilities to handle the call. If the file is in use, the lock manager issues an advisory to the calling application, but it does not prevent the application from accessing a busy file. The application must determine how to respond to the advisory, using its own facilities.

Despite the fact that the network lock manager adheres to lockf and fcntl semantics, its operating characteristics are influenced by the nature of the network, particularly during crashes.

NFS Locking and Crash Recovery

As part of the file locking service, the network lock manager assists with crash recovery by maintaining state information on locked files. It uses this information to reconstruct locks in the event of a server or client failure.

When an NFS client goes down, the lock managers on all of its servers are notified by their status monitors, and they simply release their locks, on the assumption that the client will request them again when it wants them. When a server crashes, however, matters are different. When the server comes back up, its lock manager gives the client lock managers a grace period to submit lock reclaim requests. During this period, the lock manager accepts only reclaim requests. The client status monitors notify their respective lock managers when the server recovers. The default grace period is 45 seconds.

After a server crash, a client may not be able to recover a lock that it had on a file on that server, because another process may have beaten the recovering application process to the lock. In this case the SIGLOST signal is sent to the process (the default action for this signal is to kill the application).

NFS Locking and the Network Status Monitor

To handle crash recoveries, the network lock manager relies on information provided by the network status monitor. The network status monitor is a general service that provides information about network systems to network services and applications. The network status monitor notifies the network lock manager when a network system recovers from a failure, and by implication, that the system failed. This notification alerts the network lock manager to retransmit lock recovery information to the server.

To use the network status monitor, the network lock manager registers with the status monitor process (see the statd(1M) man page ) the names of clients and servers for which it needs information. The network status monitor then tracks the status of those systems and notifies the network lock manager when one of them recovers from a failure.