Chapter 6. Troubleshooting NIS

This chapter provides information to be used in troubleshooting the NIS environment. The chapter is divided into two parts: problems seen on an NIS server and problems seen on an NIS client. Each section describes general trouble symptoms followed by a discussion of probable causes.

This chapter contains these sections:

Debugging an NIS Server

Before trying to debug an NIS server, be sure you understand the concepts in Chapter 1, “Understanding NIS”, and Chapter 2, “Preparing to Manage NIS”, in this guide.

Different Map Versions

Since NIS works by propagating maps from the NIS master server to NIS slave servers within the same domain, you may find different versions of a map on different servers. Each time a map is updated, a new order number (map version) is attached to the map. This information can be obtained with the yppoll command.

Version skew, or out-of-sync maps, between servers is normal when maps are being propagated from the NIS master server to the slave servers. However, when the maps on different servers remain unsynchronized even after the NIS environment has stabilized, it usually indicates a problem.

The normal update of NIS maps is prevented when an NIS server or some gateway system between the NIS master server and NIS slave servers is down during a map transfer attempt. This condition is the most frequent cause of out-of-sync maps on servers. Normal update procedures are described in Chapter 5, “Maintaining NIS”. When all the NIS servers and all the gateways between the NIS master and NIS slave servers are up and running, ypxfr should successfully transfer maps and all NIS servers' maps should be in sync.

The next section describes how to use ypxfr manually to update NIS maps. If ypxfr transfers maps successfully when it is initiated manually but still fails intermittently, it requires additional investigation on your part, which is described in the section, “Intermittent, Consistent Map Propagation Failures”.

Isolated, One-Time Map Propagation Failures

If a particular slave server has an isolated, one-time problem updating a particular map or its entire map set, follow these steps to resolve the problem by running ypxfr manually:

  1. ypxfr requires a complete map name rather than a nickname, so get a list of complete map names for maps in your domain, by giving this command:

    # 
    ypwhich -m
    

    The system returns a list of complete map names and the name of the NIS master server for each map. Output should be similar to this output for an NIS master server named circles:

    ypservers circles
    netid.byname circles
    bootparams circles
    mail.aliases circles
    netgroup.byhost circles
    netgroup.byuser circles
    netgroup circles
    protocols.byname circles
    protocols.bynumber circles
    services.byname circles
    rpc.bynumber circles
    networks.byaddr circles
    networks.byname circles
    ethers.byname circles
    ethers.byaddr circles
    hosts.byaddr circles
    hosts.byname circles
    group.bygid circles
    group.byname circles
    passwd.byuid circles
    passwd.byname circles
    mail.byaddr circles
    

  2. For each map that is not being updated, transfer the map manually using ypxfr:

    # ypxfr -f map.name
    

    map.name is the complete name of the map, for example, hosts.byname.

    If ypxfr fails, it supplies an error message that points you to the problem. If it succeeds, you should see output similar to this:

    Transferred map hosts.byname from NIS_master (1091 entries).
    

Intermittent, Consistent Map Propagation Failures

This section describes several procedures you can use to help isolate intermittent map propagation problems.

If the error message Transfer not done: master's version isn't newer appears, check the dates on the master and slave servers.

On the NIS master server, check to ensure that the NIS slave server is included in the ypservers map within the domain. If the slave server is not in the ypservers map, the master server does not know to propagate any changed and updated maps automatically to the server. If the server has the correct entry in its crontab file to have ypxfr request updated maps from the master server, the slave server gets the updated maps, but this action is not initiated by the NIS master server. These steps illustrate how to verify the ypservers map:

  1. Review the contents of the ASCII file used to create the ypservers map:

    # cat /var/yp/ypservers 
    

    If the server is not listed, add the server's name using any standard editor.

  2. Once the /var/yp/ypservers file has been edited, if necessary, ensure that the actual map is updated on the master server. This is a special map and no attempt is made to push it to the other servers. Give this command:

    # /var/yp/ypmake -f ypservers
    

Another possible reason for out-of-sync maps is a bad ypxfr script. Inspect root's crontab (/var/spool/cron/crontabs/root) and the ypxfr shell scripts it invokes (/var/yp/ypxfr_1ph, /var/yp/ypxfr_1pd, and /var/yp/ypxfr_2pd). Typographical errors in these files can cause propagation problems, as do failures to refer to a shell script within crontab, or failures to refer to a map within any shell script. Also ensure that the configuration flags are on for yp and nsd with the chkconfig command. For details see the chkconfig(1M) man page.

Finally, if the above suggestions do not solve the intermittent map propagation problem, you need to monitor the ypxfr process over a period of time. These steps show how to set up and use the ypxfr log file:

  1. Create a log file to enable message logging. Enter these commands:

    # cd /var/yp 
    # touch ypxfr.log 
    

    This saves all output from ypxfr. The output looks much like the output from ypxfr when run interactively, but each line in the log file is timestamped. You may see unusual ordering in the timestamps. This is normal; the timestamp tells you when ypxfr began its work. If copies of ypxfr ran simultaneously, but their work took differing amounts of time, they may actually write their summary status line to the log files in an order different from the order of invocation.

    Any pattern of intermittent failure shows up in the log. Look at the messages to determine what is needed to fix the failure. You know that you have fixed it when you no longer receive failure messages.

  2. When you have fixed the problem, turn off message logging by removing the log file. Give this command:

    # rm ypxfr.log 
    


    Note:  If you forget to remove the log file, the log file grows without limit.


As a last resort and while you continue to debug, you can transfer the map using the remote file copy command, rcp, to copy a recent version from any healthy NIS server. You may not be able to do this as root, but you probably can do it by using the guest account on the master server. For instance, to copy the map hosts in the domain shapes.com from the master server circles to the slave server squares, enter this command:

# rcp guest@circles:/var/ns/domains/shapes.com/hosts.\* \
/var/ns/domains/shapes.com 

The escaped asterisk (\*) allows the remote copy of all mdbm record files for the hosts map.

nsd Fails

If nsd fails almost immediately each time it is started, look for a more general networking problem. Because NIS uses Remote Procedure Calls (RPC), the portmapper must be functioning correctly for NIS to work.

To verify that the portmapper is functioning and that the nsd protocol is registered with the portmapper, enter this command on the server:


# /usr/etc/rpcinfo
 -p | grep ypserv 

If your portmap daemon is functional, the output looks something like this:

100004    2    udp    1051    ypserv 
100004    2    tcp    1027    ypserv 

If these entries are not in your output, nsd has been unable to register its services with the portmap daemon. If the portmap daemon has failed or is not running, you get this error message:

rpcinfo: can't contact portmapper: Remote system error - connection refused

If the information returned by rpcinfo does not match the information shown above or if the error message is returned, reboot the server. Rebooting the server ensures that the network daemons, specifically portmap and nsd, are started in the correct order. See the nsd(1M), portmap(1M), and rpcinfo(1M) man pages for further details.

Debugging an NIS Client

Before trying to debug an NIS server, be sure you understand the concepts in Chapter 1, “Understanding NIS”, and Chapter 2, “Preparing to Manage NIS”, in this guide.

Command Hangs

The most common problem on an NIS client is for a command to hang and generate SYSLOG messages such as this:


NIS v.2 server not responding for domain domain_name; still trying

Sometimes many commands begin to hang, even though the system as a whole seems to be working and you can run new commands.

The messages above indicates that nsd on the local system is unable to communicate with nsd in the domain domain_name. This can happen as a result of any of these situations:

  • The network has been disconnected on the NIS client; for example, the Ethernet cable is unplugged.

  • An incorrect domain name has been specified.

  • The network or the NIS server is so overloaded that nsd cannot get a response back to the nsd daemon within the time-out period.

  • nsd on the NIS server has crashed.

  • The NIS server has crashed or is unreachable via the network.

  • There is a physical impairment on the local area network. Under these circumstances, all the other NIS clients on the same local area network should show the same or similar problems.

A heavily loaded network and/or NIS server may be a temporary situation that might resolve itself without any intervention. However, in some circumstances, the situation does not improve without intervention. If intervention becomes necessary, the following four questions help to isolate and correct the situation.

Question 1: Is the client attached to the network?  

Typically, if there is a problem with the physical connection from the client to the network, a message similar to this appears in the console window on the system:

ec0: no carrier: check Ethernet cable

If NIS commands hang and you have the message shown above, verify that the physical connection from the client to the local area network is secure and functioning. If you do not know how to check your physical connection, see the Owner's Guide for your system for more details. Also check to ensure that the client is attached to the correct physical network.

Question 2: Does the client have the correct domain set?  

Clients and servers must use the same domain name if they want to belong to the same domain. Servers supply information only to clients within their domain. The domain names must match exactly. The domain shapes.com is not the same as the domain SHAPES.com. Clients must use a domain name that the NIS servers for their domain recognize.

Verify the client's current domain name by giving the domainname command and by looking at the contents of the file  /var/yp/ypdomain, which is read at system startup. Perform these steps to determine the client's current domain:

  1. Determine the current domain name:

    # domainname
    current_domain_name 
    

  2. Look at /var/yp/ypdomain to determine the domain name set at system startup:

    # cat /var/yp/ypdomain
    current_domain_name 
    

Compare these values to those found on the servers. If the domain name on the client differs from the domain name on the server, change the domain on the client:

  1. Edit, using any standard editor, /var/yp/ypdomain to reflect the correct domain name. This file assures that the domain name is correctly set every time the client boots. There should be only one entry in this file:

    correct_domain_name 
    

  2. Set domainname by hand so it is fixed immediately. Give this command:

    # domainname correct_domain_name
    

  3. Restart nsd so that the client is bound within the correct domain. Give these commands:

    # /etc/killall -HUP nsd
    

Question 3: Do you have enough NIS servers?  

NIS servers do not have to be dedicated systems; and as multipurpose systems, they are susceptible to load escalations. If an NIS server is overloaded, the client's nsd process automatically switches to another less heavily loaded server. Check to ensure that designated servers are functioning and accessible via the network.

By default, when an NIS client boots it can only bind to a server that resides on the same local network. It cannot bind to a server that resides on a remote network. There must be at least one NIS server running on the local network in order for a client in the same domain to bind. Two or more NIS servers per local network improve availability and response characteristics for NIS services.

Question 4: Are the NIS servers up and running?  

Check other clients on your local network. If several client systems have NIS-related problems simultaneously, suspect the NIS server. It may be that the NIS server system is down or inaccessible or that the nsd process has crashed on the NIS server.

If an NIS server crashes or becomes unavailable, it should not affect NIS performance if there are multiple NIS servers on a network. The clients automatically switch to another server. If there is only one server on the network, check to ensure that the server is up by remotely logging in to the server.

If the server is up, the problem may be that the nsd process has crashed on the server. Enter these commands to find out if nsd is running and restart it if it is not:

  1. Log into the NIS server system. Look for nsd processes. Give this command:

    # ps -ef | grep nsd
    

    You should see output similar to this:

    root   128     1  0  Sep 13  ?        1:35 /usr/etc/nsd 
    

  2. If the server's nsd daemon is not running, start it up by typing:

    # nsadmin restart
    

  3. Give the command ypwhich on the NIS server system:

    # ypwhich
    

    If ypwhich returns no answer, nsd is probably not working.

  4. If nsd is not working, give this command to kill the existing nsd process and start a new one:

    # nsadmin restart
    

NIS Command Fails

Another problem that can occur on an NIS client is for a command to fail due to a problem with the NIS daemon, nsd . These examples illustrate typical error messages you might see when you give an NIS command and nsd has failed:

# ypcat hosts 
ypcat: can't bind to NIS server for domain domain_name.
Reason: can't communicate with nsd. 
# yppoll aliases
Sorry, I can't make use of the NIS. I give up.

In addition to the preceding error messages, these general symptoms may also indicate that the nsd process has crashed:

  • Some commands appear to operate correctly while others terminate, printing an error message about the unavailability of NIS.

  • Some commands work slowly in a backup-strategy mode peculiar to the program involved.

  • Some commands do not work and/or daemons crash with obscure messages or no message at all.

To correct this situation, stop and restart the nsd process on the client with the following command:

# nsadmin restart

Give this command to verify that the nsd process is running:

# ps -ef | grep nsd

You should see output similar to this:

root 26995     1  0 17:35:31 ?         0:00 /usr/etc/nsd

ypwhich Output Inconsistent

When you enter the ypwhich command several times on the same client, the answer you receive may vary because the NIS server has changed. This response is normal. The binding of an NIS client to an NIS server changes over time on a busy network and when the NIS servers are busy. Whenever possible, the system stabilizes at a point where all clients get acceptable response time from the NIS servers. As long as the client gets NIS service, it does not matter where the service comes from. An NIS server may get its own NIS services from another NIS server on the network.

Before You Call for Help

Before you call your support provider, please use the recommendations in this chapter for solving your problems independently. If your problems persist and you find it necessary to call, please have this information ready:

  • System serial number.

  • Operating system and NFS version numbers (from versions). Include eoe and nfs.

  • A specific description of the problem. Write down and be prepared to provide any error messages that might help in isolating the problem.

  • Are there other vendors' systems involved?

  • What does the physical layout look like? Are there gateways?

  • How many slave servers do you have per network?

  • What are the names of the master server, slave server(s), and domain?

  • How many systems are in your domain?

  • Do you have multiple domains?