Chapter 5. Troubleshooting

This chapter discusses some troubleshooting tips for problems arising with peripheral devices. It contains the following sections:

Troubleshooting Your Modem Setup

If there are any problems with the cu dial-out process, you may want to use the -d option to cu to instruct the system to print diagnostic messages to your system console and the -l option to connect directly to the modem (if you added the “Direct” statement in the Devices file).

To test the modem connection on port 2, type

cu -d -lttyd2

  • The Connected message should display on the console. Type AT and the OK message should display. If not, the modem is not correctly configured or there is a problem with the cable.

  • If Connected does not display, check the debugging messages to determine where the connection failed.

Additionally, double-check that all your hardware connections are secure and that you are using a Silicon Graphics modem cable or one made to the specifications described in your Owner's Guide.

Troubleshooting Your Printing System

If you send a print request to a printer with lp and do not receive any output, use the checklists below to make sure your system is ready for printing. These lists supplement the troubleshooting information in the manufacturer's hardware manual.  

Hardware Troubleshooting Checklist

Use the following list of questions to determine whether your printer hardware is working as designed:

  • Is the printer turned on?

    Printers do not always indicate clearly whether they are turned on. Make sure the printer is plugged into the power socket and the power switch is on.

  • Does the printer have paper?

    Frequently, printers run out of paper in a high-volume situation.

  • Is there a paper jam?

    Make sure the entire paper pathway is clear of sheets or fragments of paper. Refer to your printer hardware documentation before attempting to put any unusual paper or other media through your printer.

  • Is the printer set to the correct baud?

    Be sure the baud rate of the printer matches that of the serial port.

  • Is the serial cable attached correctly?

    Often, reseating the serial cable where it connects to the printer restores correct operation.

  • Is the correct cable being used?

    The use of the pins in serial cables varies somewhat in different applications. Cables designed for specific hardware may or may not function correctly with different hardware. Check your system Owner's Guide and the documentation supplied with your printer and cable to determine whether the cable is correct for your hardware.

Software Troubleshooting Checklist

The lp scheduler is the program in charge of spooling your files to the printer, and it is invoked whenever you use the lp print command. The scheduler can be in a number of states, and each printer registered with lp can be in a number of states as well.

To check on the complete status of the lp system, type

lpstat -t 

This gives you a complete description of the status of lp. You may also want to examine the contents of the file /var/spool/lp/log. Use the information you find to answer the following questions:

  • Is your printer registered with lp?

    If you do not see the name of your printer in the list of information produced by lpstat, then you must register your printer with lp.

  • Is the printer enabled?

    If your printer is not enabled, the lpstat listing contains this line:

    printer yourprinter disabled since...
    

    To enable the printer, type

    enable yourprinter 
    

    lp sometimes disables a printer automatically if it is unable to send a file to a print server, so a disabled printer often indicates a hardware problem, such as a host that is not communicating with the network.

  • Is the printer accepting requests?

    If the printer is not accepting requests, the lpstat listing contains this line:

    yourprinter not accepting requests since...
    

    You must use the accept command for that printer destination. Become the superuser (with su) and type

    /usr/lib/accept yourprinter 
    

  • Is the lp scheduler running?

    If the scheduler is not running, the lpstat listing contains the message

    scheduler is not running 
    

    To restart the lp scheduler, become superuser (with su) and type

    /usr/lib/lpsched 
    

  • Did you specify the right printer?

    If your system has more than one printer, and you wish to send a job to a printer other than the default, remember to use the -d option:

    lp -dotherprinter 
    

Troubleshooting Network Printers

If you are having trouble with a printer you are accessing over a network, check the status of the lp scheduler on your workstation or the print server's host system.

Emergency Measures

If none of the above procedures work, there are several “last resort” procedures:

  1. Stop the lp scheduler and then restart it. As root, type the following sequence of commands:

    /usr/lib/lpshut 
    

    Then kill any jobs running as lp. You can identify these processes with the command

     ps -fu lp

    Then type the command

    /usr/lib/lpsched 
    

  2. Remove the offending printer destination from the lp scheduler and then register it again. Before you can do this you must either cancel any print requests going to the printer or move them to another print destination (if you have more than one).

  3. As an absolute last resort, remove all printers from the lp system, reboot the computer, and register them all once again.

Troubleshooting the BSD lpr Spooling System

If your print request does not make it to the queue, then

  • Check for error messages.

  • Double-check the command that you entered.

  • Try submitting the /etc/group file to the queue.

The file you submitted may not be in the proper format for the print server to print your request.

If your print request makes it to the queue and never gets to the print server, then

  • Do you have the print server system's IP address and hostname in the /etc/hosts file?

  • Does the print server system name match the name in the /etc/hosts file? Do they match the hostname of the print server system?

  • Did you get this error message? Waiting for remote queue to be enabled.

    This message usually means that your hostname is not in the print server system's /etc/hosts.equiv file. If your print request disappears from the queue and does not print, or prints incorrect information, then

    1. Become root and enter the commands:

      /usr/etc/lpc stop lp (or your printer name) 
      lpr /etc/group 
      cd /usr/spool/lpd (or your spool directory) 
      ls -l 
      

      Your system should return something similar to

      -rw-rw---- 1 root lp 69 Aug 23 14:02 cfA117tls 
      -rw-rw---- 1 root lp 227 Aug 23 14:02 dfA117tls 
      -rwxr----- 1 root lp 0 Aug 23 14:01 lock 
      -rw-rw-r-- 1 root lp 25 Aug 23 14:46 status
      

    2. Check the contents of the control file with the following command:

      cat cfA117tls 
      

      Your system should return something similar to

      Htls H the hostname that sent the print request 
      Proot P the person who sent the request 
      Jgroup J the jobname 
      Ctls C class/hostname 
      Lroot L the person who sent the request 
      fdfA117tls f name of the file to print 
      UdfA117tls U name of the file to remove after printing 
      N/etc/group N the original file name
      

    3. Check the copy of the print file.

      It is recommended that you use the more command just in case your test file is not as short as the /etc/group file. The df file should look exactly like the file you attempted to print. In this case, the file dfA117tls should be exactly the same as the /etc/group file.

      more dfA117tls 
      

      The system should return something similar to

      sys::0:root,bin,sys,adm 
      root::0:root 
      daemon::1:root,daemon 
      bin::2:root,bin,daemon 
      adm::3:root,adm,daemon 
      mail::4:root 
      uucp::5:uucp 
      rje::8:rje,shqer 
      lp::9: 
      nuucp::10:nuucp 
      user::20: 
      other::995: 
      demos:*:997: 
      guest:*:998:
      

      Now that you have verified that the request is properly spooling on the local system, check the print server system. You may need to contact the System Administrator of the print server system first; you need the root password. Once you enter the stop command on that system, no print requests are printed. Instead, they remain in the queue. Make sure that there are no requests in the queue that are currently printing.

    4. On the print server system, log in as root and enter the command

      /usr/etc/lpc stop lp 
      

    5. On the local system, enter the command

      /usr/etc/lpc start lp 
      

    6. On the print server system, cd to the spool directory.

      If you do not know where the spool directory is, use the cat or more command with the /etc/printcap file to look at what is set in the sd: variable.

    7. On the print server system (after step 6), enter the following command:

      ls -l 
      

      The print server system should return something similar to

      -rw-r----x 1 root 4 Aug 15 10:27 .seq 
      -rw-rw---- 1 root 69 Aug 23 14:02 cfA117tls.csd.sgi.com 
      -rw-rw---- 1 root 227 Aug 23 14:02 dfA117tls 
      -rwxr------ 1 root 0 Aug 23 14:01 lock 
      -rw-rw-r-- 1 root 25 Aug 23 14:46 status
      

    8. Check the contents of the control file.

      cat cfA117tls.csd.sgi.com 
      

      The print server system should return something similar to

      Htls H the hostname that sent the print request 
      Proot P the person who sent the request 
      Jgroup J the jobname 
      Ctls C class/hostname 
      Lroot L the person who sent the request 
      fdfA117tls f name of the file to print 
      UdfA117tls U name of the file to remove after printing
      N/etc/group N the original file name
      

    9. Examine the df* file by entering the following command:

      more dfA117tls 
      

      The system should return something similar to

      sys::0:root,bin,sys,adm 
      root::0:root 
      daemon::1:root,daemon 
      bin::2:root,bin,daemon 
      adm::3:root,adm,daemon 
      mail::4:root 
      uucp::5:uucp 
      rje::8:rje,shqer 
      lp::9: 
      nuucp::10:nuucp 
      user::20:
      other::995: 
      demos:*:997: 
      guest:*:998:
      

      The df file should look exactly like the file you attempted to print. In this case, the print server system's dfA117tls file should be exactly the same as the dfA117tls file that was on your system.

    10. On the print server system, enter the following command:

      /usr/etc/lpc start lp 
      

      Your file should now print on the printer. It should look exactly like the output of the more command. If it does not, then contact the System Administrator of the print server system.

Troubleshooting Inaccessible Tape Drives


Note: This section does not allow for customized installations and does not address complex multiple tape drive issues. Take care not to violate your maintenance agreements.


Checking the Hardware

Use the hinv command to see if the operating system recognized the tape drive at boot time. This is one of the most basic and critical tests to check hardware. (An output similar to the following is returned with the hinv command.)

Iris Audio Processor: version A2 revision 4.1.0
1 100 MHZ IP22 Processor
FPU: MIPS R4010 Floating Point Chip Revision: 0.0
CPU: MIPS R4000 Processor Chip Revision: 3.0
On-board serial ports: 2
On-board bi-directional parallel port
Data cache size: 8 Kbytes
Instruction cache size: 8 Kbytes
Secondary unified instruction/data cache size: 1 Mbyte
Main memory size: 64 Mbytes
Integral Ethernet: ec0, version 1
Integral SCSI controller 0: Version WD33C93B, revision D
CDROM: unit 4 on SCSI controller 0
Disk drive: unit 1 on SCSI controller 0
Graphics board: Indy 24-bit
Vino video: unit 0, revision 0, Indycam connected

If hinv does not report an attached tape drive, then your operating system cannot use the drive. You need to check the installation of the hardware. What you can do at this time depends on your maintenance support agreements.

Simple hardware checks are

  • If the tape drive is an external unit, does it have power? Simply powering it on does not cause it to be seen by the computer. The system must be shut down, power cycled, then rebooted.

  • During the boot phase, does the access light on the tape drive light up? If it does not flash at all, chances are the operating system is still not seeing the drive.

  • Is the SCSI cabling and termination correct? If visual inspection shows nothing obvious, try resetting the connectors. Any movement of hardware or cabling must be done with the system powered off.

If none of the above causes hinv to report the tape drive, then the most likely problem is faulty hardware. Contact your support provider.

Checking the Software

If you are reasonably sure the tape drive is correctly installed on the computer, but your software does not seem to be able to use it, the tape device's SCSI address may have changed when other SCSI devices were added to your system.

The system assumes that if /dev/nrtape exists and appears to be a tape drive of some kind, then it does not need to remake the default tape drive links of /dev/tape, /dev/nrtape, and so on. It also assumes that the first tape drive that it finds is the main tape drive. It searches for devices starting at the highest SCSI ID numbers, so the tape device on SCSI ID 7 gets the default links before a tape device on SCSI ID 3.

The default tape drive for most commands is /dev/tape. If the tape drive installation proceeded correctly, you should have at least /dev/tape and /dev/nrtape special device files. You may have several others, depending on the type of tape drive.

The mt command can be used to confirm that /dev/tape exists and that the tape drive is responding. Output similar to the following from the mt status command confirms that

Controller: SCSI
Device: ARCHIVE: Python 25601-XXX2.63
Status: 0x20262
Drive type: DAT
Media : READY, writable, at BOT

The following output means that you have another process accessing the drive right now:

/dev/nrtape: Device or resource busy

The following output appears when a special device file does not exist:

/dev/nrtape: No such file or directory

The output when a device file exists, but no hardware is responding at that address, is

/dev/nrtape: No such device

If the hardware appears to be present, but /dev/tape does not appear to be valid, confirm the file links. Take the device unit number from hinv output

Tape drive: unit 3 on SCSI controller 0: DAT

In this example the device unit number is 3 (this is likely to be different on your system). Use the following ls command to confirm that /dev/tape is linked to the correct device (change the numeral 3 to the correct numeral for your drive):

ls -l /dev/tape /dev/mt/tps0d3* 
crw-rw-rw- 2 root sys 23, 96 Sep 21 11:11 /dev/mt/tps0d3
crw-rw-rw- 2 root sys 23, 97 Jun 20 05:55 /dev/mt/tps0d3nr
crw-rw-rw- 2 root sys 23, 99 Jul 8 09:57 /dev/mt/tps0d3nrns 
crw-rw-rw- 2 root sys 23,103 Jun 20 05:55 /dev/mt/tps0d3nrnsv 
crw-rw-rw- 2 root sys 23,101 Jun 20 05:55 /dev/mt/tps0d3nrv 
crw-rw-rw- 2 root sys 23, 98 Jun 20 05:55 /dev/mt/tps0d3ns
crw-rw-rw- 2 root sys 23,102 Jun 20 05:55 /dev/mt/tps0d3nsv 
crw-rw-rw- 2 root sys 23,100 Jun 20 05:55 /dev/mt/tps0d3v 
crw-rw-rw- 1 root sys 23,102 Jun 23 09:19 /dev/tape

The major and minor device numbers are the key here. They are the two numbers separated by a comma (23 and 102)

crw-rw-rw- 1 root sys 23,102 Jun 23 09:19 /dev/tape

Match these numbers with one of the lines from /dev/mt. In this example, it should match to

crw-rw-rw- 2 root sys 23,102 Jun 20 05:55 /dev/mt/tps0d3nsv

Compare the major and minor device numbers that are reported with /dev/tape and the ones reported for /dev/mt/tps0dX*. Is there a match? If not, remove /dev/tape and /dev/nrtape and run MAKEDEV as root from the /dev directory. Give the command

./MAKEDEV tapelinks 

The MAKEDEV command can be verbose in describing what it is doing. Your output may differ in the number of devices made and the unit number. Once the MAKEDEV program has completed, go through these same checks again to be sure of success.

The MAKEDEV command does not let you choose which tape device to link to. You must make the links by hand if the MAKEDEV program does not default to the drive that you wish to use.

This covers the basic problems that administrators experience regarding missing tape drives. See the following reference pages for more information on the commands used in this section: mt(1) , ls(1) , hinv(1M) . For more technical information about tapes, see mtio(7) , tps(7M) , or mt(1) .

Troubleshooting Tape Read Errors

Often there is a quick and simple fix for an error message that is caused by a tape drive malfunction or the tape itself. Both recoverable and unrecoverable errors can be caused by something as basic as a dirty read/write head, a poorly tensioned tape, or a dropout, which is a physically bad spot on the tape. An EOT message can also mean that there is no data on the tape.

The following information covers some of the basic tape maintenance/performance functions that should be considered as factors that could either prevent future error conditions from occurring or act as aids in recovering from an existing error message:

  • Be sure your read/write head is clean.

  • Use the hinv command to determine which tape drive type is connected to your system.

  • Use the mt stat command to verify the status of the tape drive and the media.

  • Use the mt ret command before read or write operations.