Chapter 4. User-Level Access to Devices

Chapter 4. User-Level Access to Devices
Prev	Part II. Device Control From Process Space	Next

Programmed I/O (PIO) refers to loading and storing data between program variables and device registers. This is done by setting up a memory mapping of a device into the process address space, so that the program can treat device registers as if they were volatile memory locations.

This chapter discusses the methods of setting up this mapping, and the performance that can be obtained. The main topics are as follows:

“PCI Programmed I/O” discusses PIO mapping of PCI devices.
“EISA Programmed I/O” discusses PIO mapping of EISA bus devices in the Indigo² workstation line.
“VME Programmed I/O” discusses PIO mapping of VME devices.
“VME User-Level DMA ” discusses the use of the VME DMA engine.

Normally, PIO programs are designed in synchronous fashion; that is, the process issues commands to the device and then polls the device to find out when the action is complete. (However, it is possible for a user process to receive interrupts from some mapped devices if you have purchased the optional REACT software.)

A user-level process can perform DMA transfers from a VME bus master or (in the Challenge or Onyx series) a VME bus slave, directly into the process address space. The use of these features is covered under “VME User-Level DMA ”.

PCI Programmed I/O

Note: For an overview of the PCI bus and its hardware implementation in SGI systems, see Chapter 20, “PCI Device Attachment”. For syntax details of the user interface to PCI, see the pciba(7M) reference page. As of IRIX 6.5, the pciba user-level PCI bus adapter interface has replaced the usrpci facility.

Mapping a PCI Device Into Process Address Space

As discussed in “CPU Access to Device Registers” in Chapter 1, an I/O device is represented as an address, or range of addresses, in the address space of its bus. A kernel-level device driver has the ability to set up a mapping between an address on an I/O bus and an arbitrary location in the address space of a user-level process. When this has been done, the bus location appears to be a variable in memory. The program can assign values to it, or refer to it in expressions.

The PCI bus addresses managed by a device are not wired or jumpered into the board; they are established dynamically at the time the system attaches the device. The assigned bus addresses can vary from one day to the next, as devices are added to or removed from that PCI bus adapter. For this reason, you cannot program the bus addresses of a PCI device into software or into a configuration file.

In order to map bus addresses for a particular device, you must open the device special file that represents that device. You pass the file descriptor for the opened device to the mmap() function. If the device driver for the device supports memory mapping—mapping is an optional feature of a PCI device driver—the mapping is set up.

The PCI bus defines three address spaces: configuration space, I/O space, and memory space. It is up to the device driver which of the spaces it allows you to map. Some device drivers may set up a convention allowing you to map in different spaces.

PCI Device Special Files

Device special files for PCI devices are established in the /hw filesystem by the PCI device driver when the device is attached; see “Hardware Graph” in Chapter 2. These pathnames are dynamic. Typically, the system administrator also creates stable, predictable device special files in the /dev filesystem. The path to a specific device is determined by the device driver for that device.

The PCI bus adapter also creates a set of generic PCI device names for each PCI slot in the system. The names of these special files can be displayed by the following command:

find /hw -name pci  -print -exec ls -l {} \; 
/hw/module/1/slot/io1/xwidget/pci/0
total 0
crw-------    0 root     sys        0, 78 Aug 12 15:27 config
crw-------    0 root     sys        0, 79 Aug 12 15:27 default
crw-------    0 root     sys        0, 77 Aug 12 15:27 io
crw-------    0 root     sys        0, 75 Aug 12 15:27 mem
/hw/module/1/slot/io1/xwidget/pci/1
total 0
crw-------    0 root     sys        0, 85 Aug 12 15:27 config
crw-------    0 root     sys        0, 86 Aug 12 15:27 default
crw-------    0 root     sys        0, 84 Aug 12 15:27 io
crw-------    0 root     sys        0, 82 Aug 12 15:27 mem

The names are not leaf vertexes and cannot be opened. However, the names config, io, mem, and default are character special devices that can be opened from a process with the correct privilege. The names represent the following bus addresses:

Table 4-1. PCI Device Special File Names for User Access

Name	PCI Bus Address Space	Offset in mmap() Call
`config`	Configuration space or spaces on the card in this slot.	Offset in config space.
`default`	PCI bus memory space defined by the first base address register (BAR) on the card.	Added to BAR.
`io`	PCI bus I/O space defined by this card.	Offset in I/O space.
`mem`	PCI bus 32-bit or 64-bit memory address space allocated to this card when it was attached.	Offset in total allocated memory space.

Note: With pciba under IRIX 6.5 it is no longer possible to access config space directly by means of mmap() I/O—ioctl() calls must be used instead.

Opening a Device Special File

Either kind of pathname is passed to the open() system function, along with flags representing the type of access (see the open(2) reference page). You can use the returned file descriptor for any operation supported by the device driver. The pciba device driver supports only the mmap() and unmap() functions.

A driver for a specific PCI device may or may not support mmap(), read() and write(), or ioctl() operations.

Using mmap() With PCI Devices

When you have successfully opened a pciba device special file, you use the file descriptor as the primary input parameter in a call to the mmap() system function.

This function is documented for all its many uses in the mmap(2) reference page. For purposes of mapping a PCI device into memory, the parameters should be as follows (using the names from the reference page):

addr	Should be NULL to permit the kernel to choose an address in user process space.
len	The length of the span of PCI addresses to map.
prot	PROT_READ for input, PROT_WRITE for output, or the logical sum of those names when the device will be used for both input and output.
flags	MAP_SHARED. Add MAP_PRIVATE if this mapping is not to be visible to child processes created with the `sproc()` function (see the `sproc(2)` reference page).
fd	The file descriptor returned from opening the device special file.
off	The offset into the device address space.

The meaning of the off value depends on the PCI bus address space represented by the device special file, as indicated in Table 4-1.

The value returned by mmap() is the virtual address that corresponds to the starting PCI bus address. When the process accesses that address, the access is implemented by PIO data transfer to or from the PCI bus.

Map Size Limits

There are limits to the amount and location of PCI bus address space that can be mapped for PIO. The system architecture can restrict the span of mappable addresses, and kernel resource constraints can impose limits. In order to create the map, the PCI device driver has to create a software object called a PIO map. In some systems, only a limited number of PIO maps can be active at one time.

PCI Bus Hardware Errors

When the PCI bus adapter reports an addressing or access error, the error is reflected back to the device driver. This can take place long after the instruction that initiated the error transaction. For example, a PIO store to a memory-mapped PCI device can (in certain hardware architectures) pass through several layers of translation. An error could be detected several microseconds after the CPU store that initiated the write. By that time, the CPU could have executed hundreds more instructions.

When the pciba device driver is notified of a PCI Bus error, it looks up the identities of all user processes that had mapped the part of PCI address space where the error occurred. The driver then sends a SIGBUS signal to each such process. As a result of this policy, your process could receive a SIGBUS for an error it did not cause; and when your process did cause the error, the signal could arrive a long time after the erroneous transaction was initiated.

PCI PIO Example

The code in X demonstrates how to dump the standard configuration space registers of a device in PCI slot 1 on an Origin200 (PCI slot 1 is XIO bus slot 5 on this system).

Example 4-1. PCI Configuration Space Dump

/*
 * Use pciba to dump the registers found 
 * using base address register 0.
 *
 * See pciba(7m).
 */
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/fcntl.h>
#include <sys/prctl.h>
#include <unistd.h>
#include <stdio.h>
/*
 * Path assumes O2000/Onyx2 PCI shoebox installed
 * in first CPU module.
 */
#define MEMPATH "/hw/module/1/slot/io2/pci_xio/pci/2/base/0"
#define MEMSIZE (128)
extern int errno;
main(int argc, char *argv[])
{
        volatile u_int *word_addr;
        int     fd;
        char    *path;
        int     size, newline = 0;
        path = MEMPATH;
        size = MEMSIZE;
        fd = open(path, O_RDWR);
        if (fd < 0 ) {
                perror("open ../base/0 ");
                return errno;
        } else {
                printf("Successfully opened %s fd: %d\n", path, fd);
                printf("Trying mmap\n");
                word_addr = (unsigned int *)
                     mmap(0,size,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0);
                if (word_addr == (unsigned int *)-1) {
                        perror("mmap");
                } else {
                        int     i;
                        volatile int    x;
                        printf("Dumping registers \n");
                        for (i = 0; i < 32; i++){
                                x = *(volatile int *)(word_addr + i) ;
                                if (newline == 0) {
                                        printf("0x%2.2x:", i*4);
                                }
                                printf(" 0x%8.8x", x);
                                if ((++newline%4) == 0){
                                        newline = 0;
                                        printf("\n");
                                }
                        }
                }
                close (fd);
        }
        exit(0);
}

EISA Programmed I/O

The EISA bus is supported in SGI Indigo² workstations only. For an overview of the EISA bus and its implementation in SGI systems, see Chapter 18, “EISA Device Drivers”.

Mapping an EISA Device Into Memory

As discussed in “CPU Access to Device Registers” in Chapter 1, an I/O device is represented as an address or range of addresses in the address space of its bus. A kernel-level device driver has the ability to set up a mapping between the bus address of a device register and an arbitrary location in the address space of a user-level process. When this has been done, the device register appears to be a variable in memory—the program can assign values to it, or refer to it in expressions.

Learning EISA Device Addresses

In order to map an EISA device for PIO, you must know the following points:

which EISA bus adapter the device is on

In all SGI systems that support it, there is only one EISA bus adapter, and its number is 0.
whether you need access to the EISA bus memory or I/O address space
the address and length of the desired registers within the address space

You can find all these values by examining files in the /var/sysgen/system directory, especially the /var/sysgen/system/irix.sm file, in which each configured EISA device is specified by a VECTOR line. When you examine a VECTOR line, note the following parameter values:

`bustype`	Specified as `EISA` for EISA devices. The VECTOR statement can be used for other types of buses as well.
`adapter`	The number of the bus where the device is attached (0).
`iospace`, `iospace2`, `iospace3`	Each `iospace` group specifies the address space, starting bus address, and the size of a segment of bus address space used by this device.

Within each iospace parameter group you find keywords and numbers for the address space and addresses for a device. The following is an example of a VECTOR line (which must be a single physical line in the system file):

VECTOR: bustype=EISA module=if_ec3 ctlr=1
iospace=(EISAIO,0x1000,0x1000)
exprobe_space=(r,EISAIO, 0x1c80,4,0x6010d425,0xffffffff)

This example specifies a device that resides in the I/O space at offset 0x1000 (the slot-1 I/O space) for the usual length of 0x1000 bytes. The exprobe_space parameter suggests that a key device register is at 0x1c80.

Opening a Device Special File

When you know the device addresses, you can open a device special file that represents the correct range of addresses. The device special files for EISA mapping are found in /dev/eisa.

The naming convention for these files is as follows: Each file is named eisaBaM, where

B	is a digit for the bus number (0)
M	is the modifier, either `io` or `mem`

The device special file for the device described by the example VECTOR line in the preceding section would be /dev/vme/eisa0aio.

In order to map a device on a particular bus and address space, you must open the corresponding file in /dev/eisa.

Using the mmap() Function

When you have successfully opened the device special file, you use the file descriptor as the primary input parameter in a call to the mmap() system function.

This function is documented for all its many uses in the mmap(2) reference page. For purposes of mapping EISA devices, the parameters should be as follows (using the names from the reference page):

addr	Should be NULL to permit the kernel to choose an address in user process space.
len	The length of the span of bus addresses, as documented in the `iospace` group in the VECTOR line.
prot	PROT_READ, or PROT_WRITE, or the logical sum of those names when the device is used for both input and output.
flags	MAP_SHARED, with the addition of MAP_PRIVATE if this mapping is not to be visible to child processes created with the `sproc()` function (see the `sproc(2)` reference page).
fd	The file descriptor from opening the device special file in `/dev/eisa`.
off	The starting bus address, as documented in the `iospace` group in the VECTOR line.

The value returned by mmap() is the virtual memory address that corresponds to the starting bus address. When the process accesses that address, the access is implemented by data transfer to the EISA bus.

Note: When programming EISA PIO, you must always be aware that EISA devices generally store 16-bit and 32-bit values in “small-endian” order, with the least-significant byte at the lowest address. This is opposite to the order used by the MIPS CPU under IRIX. If you simply assign to a C unsigned integer from a 32-bit EISA register, the value will appear to be byte-inverted.

EISA PIO Bandwidth

The EISA bus adapter is a device on the GIO bus. The GIO bus runs at either 25 MHz or 33 MHz, depending on the system model. Each EISA device access takes multiple GIO cycles, as follows:

The base time to do a native GIO read (of up to 64 bits) is 1 microsecond.
A 32-bit EISA slave read adds 15 GIO cycles to the base GIO read time; hence one EISA access takes 19 GIO cycles, best case.
A 4-byte access to a 16-bit EISA device requires 10 more GIO cycles to transfer the second 2-byte group; hence a 4-byte read to a 16-bit EISA slave requires 25 GIO cycles.
Each wait state inserted by the EISA device adds four GIO cycles.

Table 4-2 summarizes best-case (no EISA wait states) data rates for reading and writing a 32-bit EISA device, based on these considerations.

Table 4-2. EISA Bus PIO Bandwidth (32-Bit Slave, 33-MHz GIO Clock)

Data Unit Size	Read	Write
1 byte	0.68 MB/sec	1.75 MB/sec
2 byte	1.38 MB/sec	3.51 MB/sec
4 bytes	2.76 MB/sec	7.02 MB/sec

Table 4-3 summarizes the best-case (no wait state) data rates for reading and writing a 16-bit EISA device.

Table 4-3. EISA Bus PIO Bandwidth (16-Bit Slave, 33-MHz GIO Clock)

Data Unit Size	Read	Write
1 byte	0.68 MB/sec	1.75 MB/sec
2 byte	1.38 MB/sec	3.51 MB/sec
4 bytes	2.29 MB/sec	4.59 MB/sec

VME Programmed I/O

The VME bus is supported by Origin2000 systems. For an overview of the VME bus and its hardware implementation in SGI systems, see Chapter 12, “VME Device Attachment on Origin 2000/Onyx2”.

Mapping a VME Device Into Process Address Space

As discussed in “CPU Access to Device Registers” in Chapter 1, an I/O device is represented as an address, or range of addresses, in the address space of its bus. A kernel-level device driver has the ability to set up a mapping between the bus address of a device register and a location in the address space of a user-level process. When this has been done, the device register appears to be a variable in memory. The program can assign values to it, or refer to it in expressions.

Learning VME Device Addresses

In order to map a VME device for PIO, you must know the following points:

The VME bus number on which the device resides. IRIX supports as many as five VME buses. On Challenge and Onyx systems the first VME bus is number 0; on Origin and Onyx2 systems the first VME bus is number 1. Use the hinv command to see the numbers of others (and see “About VME Bus Addresses and System Addresses” in Chapter 12).
The VME address space in which the device resides

This will be either A16, A24, or A32.
VME address space modifier that the device uses—either supervisory (s) or nonprivileged (n)
The VME bus addresses associated with the device

This must be a sequential range of VME bus addresses that spans all the device registers you need to map.

This information is normally documented in VECTOR lines found in a file in the /var/sysgen/system/ directory (see “Defining VME Devices with the VECTOR Statement” in Chapter 12).

Opening a Device Special File

When you know the device addresses, you can open a device special file that represents the correct range of addresses. The device special files for VME mapping are found in the hardware graph at paths having the form:

/hw/module/mod/slot/ion/baseio/vme_xtown/pci/7/vmebus/usrvme/assm/width

The naming convention for these hwgraph paths is documented in the usrvme(7) reference page. Briefly, each path contains these variable elements:

mod	The Origin or Onyx2 module number.
n	The XIO slot number of the VME adapter.
ss	The address space, either 16, 24, or 32.
m	VME address modifier, `s` for supervisory or `n` for nonprivileged.
width	Data width to be used, for example d32; covered in later table.

Shorter names are also created in the form

/hw/vme/busnumber/usrvme/assm/width

Tip: In previous versions of IRIX, comparable device special files were defined in the /dev directory using names such as /dev/vme/vme0a16n and the like. If you have code that depends on these names—or if you prefer the shorter names in /dev—feel free to create compatible names in /dev in the form of symbolic links to the /hw.../usrvme names.

The data width that is designated in the pathname as width can be selected from the values shown in Table 4-4.

Table 4-4. Data Width Names in VME Special Device Names

Address Space in Pathname	Supported Widths in Pathname
a16n, a16s	d16, d32
a24n, a24s	d16, d32
a32n, a32s opened for PIO access	d8, d16, d32_single
a32n, a32s opened for DMA access	d8, d16, d32_single, d32_block, d64_single, d64_block

Opening a device for DMA use is described under “VME User-Level DMA ”.

Tip: You can display all the usrvme devices in the system using the find command in the /hw directory, as in

# find /hw -name /hw/vme/\*/usrvme/\*/\* -type c -print

Using the mmap() Function

When you have successfully opened the device special file, you use the file descriptor as the primary input parameter in a call to the mmap() system function.

This function has many different uses, all of which are documented in the mmap(2) reference page. For purposes of mapping a VME device into memory, the parameters should be as follows (using the names from the reference page):

addr	Should be NULL to permit the kernel to choose the address in user process space.
len	The length of the span of VME addresses, as documented in the `iospace` group in the VECTOR line.
prot	PROT_READ for input, PROT_WRITE for output, or the logical sum of those names when the device will be used for both.
flags	MAP_SHARED. Add MAP_PRIVATE if this mapping is not to be visible to child processes created with the `sproc()` function.
fd	The file descriptor returned from opening the device special file.
off	The starting VME bus address, as documented in the `iospace` group in the VECTOR line.

The value returned by mmap() is the virtual address that corresponds to the starting VME bus address. When the process accesses that address, the access is implemented by data transfer to the VME bus.

Limits on Maps

There are limits to the amount and location of VME bus address space that can be mapped for PIO. The system architecture can restrict the span of mappable addresses. Kernel resource constraints can impose limits on the number of VME maps that are simultaneously active. You must always inspect the return code from the mmap() call.

VME PIO Access

Once a VME device has been mapped into memory, your program reads from the device by referencing the mapped address, and writes to the device by storing into the mapped address.

Typically you organize the mapped space using a data structure that describes the layout of registers. Two key points to note about the mapped space are:

You should always declare register variables with the C keyword volatile. This forces the C compiler to generate a reference to memory whenever the register value is needed.
The VME PIO hardware does not support 64-bit integer load or store operations. For this reason you must not:
- Declare a VME item as long long, because the C compiler generates 64-bit loads and stores for such variables
- Apply library functions such as bcopy(), bzero(), or memmove() to the VME mapped registers, because these optimized routines use 64-bit loads and stores whenever possible.

On an Origin or Onyx2 system, a PIO read can take one or more microseconds to complete—a time in which the R10000 CPU can process many instructions from memory. The R10000 continues to execute instructions following the PIO load until it reaches an instruction that requires the value from that load. Then it stalls until the PIO data arrives from the device.

A PIO write is asynchronous at the hardware level. The CPU executes a register-store instruction that is complete as soon as the physical address and data have been placed on the system bus. The actual VME write operation on the VME bus can take 1 or more microseconds to complete. During that time the CPU can execute dozens or even hundreds more instructions from cache memory.

VME User-Level DMA

A DMA engine is included as part of each VME bus adapter in an SGI Origin2000 system. The DMA engine can perform efficient, block-mode, DMA transfers between system memory and VME bus slave cards—cards that would normally be capable of only PIO transfers.

You can use the udma functions to access a VME Bus Master device, if the device can respond in slave mode. However, this would normally be less efficient than using the Master device's own DMA circuitry.

The DMA engine greatly increases the rate of data transfer compared to PIO, provided that you transfer at least 32 contiguous bytes at a time. The DMA engine can perform D8, D16, D32, D32 Block, and D64 Block data transfers in the A16, A24, and A32 bus address spaces.

Using the DMA Library Functions

All DMA engine transfers are initiated by a special device driver. However, you do not access this driver through open/read/write system calls. Instead, you program it through a library of functions. The functions are documented in the vme_dma_engine(3) reference page. They are used in the following sequence:

Call vme_dma_engine_alloc() to initialize DMA access to a particular VME bus adapter, specified by device special file name (see “Opening a Device Special File”). You can create an engine for each available bus.
Call vme_dma_engine_buffer_alloc() to allocate storage to use for DMA buffers. This function pins the memory pages of the buffers to prevent swapping.
You can call vme_dma_engine_buffer_addr_get() to return the address of a buffer allocated by the preceding function.
Call vme_dma_engine_transfer_alloc() to create a descriptor for an operation, including the buffer, the length, and the direction of transfer as well as several other attributes. The handle can be used repeatedly.
Call vme_dma_engine_schedule() to schedule one transfer (as described to vme_dma_engine_transfer_alloc()) for future execution. The transfer does not actually start at this time. This function can be called from multiple, parallel threads.
Call vme_dma_engine_commit() to commence execution of all scheduled transfers. If you specify a synchronous transfer, the function does not return until the transfer is complete.
If you specify an asynchronous transfer, call vme_dma_engine_rendezvous() after starting all transfers. This function does not return until all transfers are complete.

In prior releases, user-level DMA was provided through a comparable library of functions with different names and calling sequences. That library of functions is supported in the current release (see a prior edition of this manual, and the udmalib(3) reference page if installed). The new library described here is recommended.

Prev	Table of Contents	Next
Part II. Device Control From Process Space		Chapter 5. User-Level Access to SCSI Devices