Chapter 7. Troubleshooting Diskless Installations

This chapter provides information to help you correct problems that might occur in your diskless implementation. It also explains what to do before you call the Silicon Graphics Technical Assistance Center and how to prepare for your call if you determine that one is necessary.

This chapter contains these sections:

General Approach to Troubleshooting

Frequently, problems with a diskless implementation are due to problems with the network on which it is implemented. If your diskless service is not working as expected, first determine whether there are network problems and correct any you find; then look at the diskless environment to correct any problems that persist.

Problems that are specific to the diskless environment are usually caused by one of these general conditions:

  • Diskless software is improperly configured.

  • The boot process is not completing successfully.

Troubleshooting Checklist

Some problems you might encounter with your diskless configuration can be solved by considering these questions:

  1. Did you install an official IRIX release that both the client and server support?

    If you suspect that the software you are using is not an officially released version of IRIX (such as beta software, for example), remove and reinstall the share tree and any client trees made from the share tree.

  2. Has any software in the diskless tree been modified because share_inst or client_inst installed it?

    You can answer this question with a simple test. Try to remove the client tree with client_inst. At the beginning of this procedure, client_inst checks various tables and files to verify that they are in the same form as when it built them. If client_inst finds anything changed, it refuses to remove the client tree.


    Note: You can safely do this test without deleting the client tree, because client_inst will ask you for confirmation before it removes anything on the tree.


  3. Are hardware variables set correctly in share.dat and client.dat?

    Values for CPUBOARD, CPUARCH, GFXBOARD, and VIDEO must be set correctly during software installation and should not be changed once the share tree is built. All hardware settings in individual client.dat files must be represented by a corresponding setting in the share.dat file that defines their share tree.

  4. Did you try to run lboot on the share tree or a client tree?

    If you ran lboot, remove and reinstall all software on any share and client tree where you ran this utility.

  5. Are the versions of IRIX on the share tree suitable for the server?

    Although you can run any version of IRIX on a server, the share trees it supports cannot be running later software versions than the server itself. To eliminate the possibility of compatibility problems, it is recommended that the version of IRIX on share trees be the same as the version of IRIX on the server supporting them, particularly if you are running maintenance releases.

Installation Error Messages

Table 7-1 shows error messages that can be displayed when you are using share_inst or client_inst to build a diskless tree. It also suggests what you can do to correct the problems.

Table 7-1. Installation Error Messages

error messages

check here

/var/boot/classname.dat not found

The client.dat file for the share tree that client_inst is trying to build is missing. Check /var/boot to make sure that the appropriate client.dat is listed and check your spelling of the file name.

Invalid hostname_clientname

client_inst cannot find the client's name in a hosts database. If NIS=no in the configuration file for this client, check the server's local /etc/hosts file for the client's entry. If NIS=yes in the configuration file for this client, check the NIS hosts database for the client and verify that the diskless server is an NIS client.

<clientname > is not a diskless client for CLASS classname

You are trying to remove a client from the wrong diskless server or client_inst cannot determine the client-server relationship. If NIS=no in the configuration file for this client, check the server's local /etc/bootparams file for correct entries. If NIS=yes in the configuration file for this client, check the NIS bootparams database and verify that the diskless server is an NIS client.


Removing a Diskless Class Manually

When you cannot use share_inst or client_inst to remove diskless software, use this procedure to remove a diskless tree manually. Do this procedure as the superuser on the server.

  1. Remove the share tree.

    This command sequence changes to the root directory for share trees and removes the share tree IRIX_51, using the recursive option of the rm commands:

    # cd /diskless/share
    # rm -r IRIX_51
    

  2. Remove the client trees.

    This command sequence changes to the root directory for client trees and removes the client tree for starlite. Repeat the rm -r command for each client you need to remove:

    # cd /diskless/client
    # rm -r starlite
    

  3. Remove the swap trees in the class.

    This command sequence changes to the root directory for swap trees and removes the swap tree for starlite. Repeat the rm -r command for each client you need to remove:

    # cd /diskless/swap
    # rm -r starlite
    

  4. Clean the /etc/exports file on the server.

    The share and client tree entries are listed in the server's /etc/exports file. Delete them individually to prevent unintentional deletion of other information, as shown in these examples:

    # export -u /diskless/client/starlite
    # export -u /diskless/swap/starlite
    # export -u /diskless/share/IRIX_51
    

  5. Clean unwanted entries in the /etc/bootparams file on the server:

    If you want to retain other share trees, be very careful. You should delete only the individual entries for an unwanted client or share tree. An /etc/bootparams entry looks like the following example.

        dayglo.bldg9.dude.com root=babylon:/d1/client/dayglo \
             sbin=babylon:/d1/share/5_1/sbin swap=babylon:/d1/swap/daglo\
        5_1.a109 root=babylon:/diskless/sh109/5_1.a109 sbin=babylon: swap=babylon:
    

Debugging the Boot Process

Occasionally, a client does not boot after a diskless software installation or upgrade. A number of factors can contribute to this problem. To diagnose booting problems, it is helpful to understand the boot process in some detail. The four phases of the boot process are described below.

Phase 1: The Boot Request

In phase 1, the diskless client initiates a bootp request. This request can be in the form of a broadcast going to all servers on the same network, or a specific request to a known diskless server. If a router is installed between the server and its client, the router must be configured to forward bootp requests (see the -f argument to bootp in the bootp(1M) reference page). The transport mechanism is the User Datagram Protocol (UDP/IP).

Once the bootp request reaches the diskless server, the super server inetd starts the bootp server. bootp is responsible for resolving the client's identity by examining its configuration file, /var/bootptab, the /etc/ethers file, and the /etc/hosts file, in that order. The bootp server is also responsible for resolving the boot filetoname, which is specified in the client's PROM variable, bootfile.

Phase 2: The Boot File Download

When the bootp server has resolved the client identity and boot file location, the client sends a tftpd (the Trivial File Transport Protocol) request for its boot file, /var/boot/client_name/unix, which invokes TFTP on the server. The server sends the kernel and the client loads it into memory.

Phase 3: The Server-Client Setup

Once the client has loaded the kernel into memory, it initiates a broadcast RPC call to the server. The inetd daemon on the diskless server starts up the bootparamd server. The bootparamd server provides the information to the diskless client necessary for booting; primarily, the location of its root, share, and swap trees. This information is contained in the bootparams database file, /etc/bootparams, either locally or in the NIS databases. The server-client relationship is then sent back to the client.

Phase 4: Client Request for Software

After the server-client relationship is confirmed, the client issues an NFS request to the server for its root, sbin, and swap filesystems. The server then starts the rpc.mountd server to answer the request (rpc.mountd is responsible for verifying access to a filesystem by clients and users). If the client has sufficient permission to mount the requested filesystems, the mount and boot processes are completed.

Figure 7-1 explains the events in the boot process.

Figure 7-1. Diskless Boot Process

Figure 7-1 Diskless Boot Process

Handling Booting Error Messages

To see error messages that occur during the client's boot process, the NVRAM verbose variable must be on. To set verbose to on, use this command:

>> setenv verbose on

Table 7-2 describes common error messages that you can see on the client during the boot process and suggests how to correct them.

Table 7-2. Boot Process Troubleshooting Hints

Client Error Messages

What to Check

[ec0, enp0, et0, ipg0, fxp0]: transmit: no carrier ef, me
no server for server: /var/boot/client/unix
Unable to continue: press <enter> to return to the menu:

Check the network cable to be sure that it is securely attached.

no server for server: /var/boot/client/unix
Unable to continue: press <enter> to return to the menu:

On the client, check the NVRAM netaddr variable to ensure that it has the correct Internet address.

On the server, check the /etc/inetd.conf file to ensure that the bootp server is supported.

File /var/boot/client/unix not found on server servername
Unable to continue: press <enter> to return to the menu:

On the client, the NVRAM bootfile variable might have the wrong path or bootfile name.

On the server, check to ensure that the bootfile specified in the bootfile variable on client actually exists.

Starting up the system ...
Copyright message...
PANIC: KERNEL FAULT......

On the client, the diskless variable may not be enabled.

The kernel built for this client workstation might not be appropriate for its model type. Check the architecture variables in client.dat and server.dat to be sure the correct kernel modules have been specified.

Starting up the system...
Unable to continue: press <enter> to return to the menu:

On the server, check the /etc/inetd.conf file to ensure that the tftpd server is supported and configured correctly. The default tftpd entry runs in secure mode and supports the diskless environment. The default entry is
tftpd -s /var/local/boot

kernel mount failed, check server, bootparams
or press reset button!!!
Get_bootp failed

On the server, check the /etc/inetd.conf file to ensure that bootparamd is supported (the bootparamd entry is present and is not preceded by the comment character, #).

GET_BOOTP: WHOAMI fail, addr=0x
Get_bootp failed
Kernel mount failed, check server, bootparams
or press reset button!!!

On the server or NIS master, check that the /etc/bootparams file contains valid entries for the client. If the client was created with NIS=yes in client.dat, the problem is with the NIS bootparams database. If the client was created with NIS=no in client.dat, the problem is with the server's local /etc/bootparams file. Verify that /var/yp/ypdomain contains an entry for the server.

sv1: missing
Portmapper not responding; still trying

 

On the server, check the HOSTNAME setting in share.dat to be sure that it specifies the network interface to the diskless LAN. If the diskless LAN is on a secondary network interface, HOSTNAME should not be set to hostname unless /etc/sys_id contains the name of the diskless LAN interface.

Kernel mount failed, check server, bootparams
or press reset button!!!
Diskless root mount failed:

On the server, check the /etc/exports(1M) file. The diskless tree filesystems (root, swap, and share) may not be accessible to clients.

nfs_rmount: SHARE MOUNT FAILED.

On the server, check the /etc/exports file. The share tree for this client may not be accessible.


Before You Call for Help

Silicon Graphics support organizations are interested in your problems and are eager to help. However, before you call, please use the recommendations in this chapter for solving your problems independently. If your problems persist and you find it necessary to call, please have this information ready:

  • Make printed copies of the /etc/exports and /etc/bootparams files for the server (not the share tree).

  • Make printed copies of all share.dat and client.dat files in the server's /var/boot directory—one for each client class. Label them so that you can distinguish among them.

  • Make a list of all of diskless clients that you have on the network, including workstation models, graphics board types, and hostnames.