Chapter 10. Using FFIO

This chapter describes how you can use flexible file I/O (FFIO) with common file structures and how to enhance code performance without changing your source code.

FFIO on IRIX systems

The FFIO library calls the aio_sgi_init library routine the first time the library issues an asynchronous I/O call. It passes the following parameters to aio_sgi_init:

aio_numusers=MAX(64,sysconf(_SC_NPROC_CONF))
aio_threads=5
aio_locks=3

If a program is using multiple threads and asynchronous I/O, it is important that the value in aio_numusers be at least as large as the number of sprocs or pthreads that the application contains. See the aio_sgi_init man page for more details.

Users can change these values by setting the following environment variables to the desired value:

  • change FF_IO_AIO_THREADS to modify aio_threads

  • change FF_IO_AIO_LOCKS to modify aio_locks

  • change FF_IO_AIO_NUMUSERS to modify aio_numusers

In the following example, aio_threads is set to 8 when the FFIO routines call aio_sgi_init:

setenv FF_IO_AIO_THREADS 8

Users can also supersede the FFIO library's call to aio_sgi_init by calling it themselves, before the first I/O statement in their programs.

The following FFIO layers may issue asynchronous I/O calls on IRIX systems:

  • cos: see the description of cos on the INTRO_FFIO(3f) man page for a description of the circumstances when the cos layer uses asynchronous I/O.

  • cachea and bufa: users should assume that these layers may issue asynchronous I/O calls.

  • system or syscall: these layers may issue asynchronous I/O calls if called from a BUFFER IN or BUFFER OUT Fortran statement, or if called from one of the listed layers.

FFIO and Common Formats

This section describes the use of FFIO with common file structures and describes the correlation between the common and/or default file structures and the FFIO usage that handles them.

Reading and Writing Text Files

Most human-readable files are in text format; this format contains records comprised of ASCII characters with each record terminated by an ASCII line-feed character, which is the newline character in UNIX terminology. The FFIO specification that selects this file structure is assign -F text.

The FFIO package is seldom required to handle text files. In the following types of cases, however, using FFIO may be necessary:

  • Optimizing text file access to reduce I/O wait time

  • Handling multiple EOF records in text files

  • Converting data files to and from other formats

I/O speed is important when optimizing text file access. Using assign -F text is expensive in terms of CPU time, but it lets you use memory-resident files, which can reduce or eliminate I/O wait time.

The FFIO system also can process text files that have embedded EOF records. The ~e string alone in a text record is used as an EOF record. Editors such as sed(1) or other standard utilities can process these files, but it is sometimes easier with the FFIO system.

Use the fdcp command to copy files while converting record blocking.

Reading and Writing Unblocked Files

The simplest form of data file format is the simple binary stream or unblocked data . It contains no record marks, file marks, or control words. This is usually the fastest way to move large amounts of data, because it involves a minimal amount of CPU and system overhead.

The FFIO package provides the syscall layer, which is designed specifically to handle this binary stream of data. The unblocked binary stream is usually used for unformatted data transfer. It is not usually useful for text files or when record boundaries or backspace operations are required. The complete burden is placed on the application to know the format of the file and the structure and type of the data contained in it.

This lack of structure also allows flexibility; for example, a file declared with one of these layers can be manipulated as a direct-access file with any desired record length.

In this context, fdcp can be called to do the equivalent of the cp(1) command only if the input file is a binary stream and to remove blocking information only if the output file is a binary stream.

Reading and Writing Fixed-length Records

The most common use for fixed-length record files is for Fortran direct access. Both unformatted and formatted direct-access files use a form of fixed-length records. The simplest way to handle these files with the FFIO system is with binary stream layers, such as system, syscall, cache, and cachea. These layers allow any requested pattern of access and also work with direct-access files. The syscall and system layers, however, are unbuffered and do not give optimal performance for small records.

The FFIO system also directly supports some fixed-length record formats.

Reading and Writing COS Blocked Files

The cos layer is provided to sequential unformatted files. It provides for COS blocked files on disk and on magnetic tape and it supports multifile COS blocked datasets.

The cos layer must be specified for COS blocked files. If COS is not the default file structure, or if you specify another layer you may have to specify a cos layer to get COS blocking.

Enhancing Performance

FFIO can be used to enhance performance in a program without changing the source code or recompiling the code. This section describes some basic techniques used to optimize I/O performance. Additional optimization options are discussed in Chapter 12, “I/O Optimization ”.

Buffer Size Considerations

In the FFIO system, buffering is the responsibility of the individual layers; therefore, you must understand the individual layers in order to control the use and size of buffers.

The cos layer has high payoff potential to the user who wants to extract top performance by manipulating buffer sizes. As the following example shows, the cos layer accepts a buffer size as the first numeric parameter:

assign -F cos:42 u:1

The preceding example declares a working buffer size for the cos layer of forty-two 4096-byte blocks. This is an excellent size for a file that resides on a DD-49 disk drive because a track on a DD-49 disk drive is comprised of forty-two 4096-byte blocks.

If the buffer is sufficiently large, the cos layer also lets you keep an entire file in the buffer and avoid almost all I/O operations.

Removing Blocking

I/O optimization usually consists of reducing overhead. One part of the overhead in doing I/O is the CPU time spent in record blocking. For many files in many programs, this blocking is unnecessary. If this is the case, the FFIO system can be used to deselect record blocking and thus obtain appropriate performance advantages.

The following layers offer unblocked data transfer:

Layer

Definition

syscall

System call I/O

bufa

Buffering layer

cachea

Asynchronous cache layer

cache

Memory-resident buffer cache

You can use any of these layers alone for any file that does not require the existence of record boundaries. This includes any applications that are written in C that require a byte stream file.

The syscall layer offers a simple direct system interface with a minimum of system and library overhead. If requests are larger than approximately 32 Kbytes, this method can be appropriate, especially if the requests are a uniform multiple of 4096 bytes.

The other layers are discussed in the following sections.

The bufa and cachea Layers

The bufa layer and cachea layer permits efficient file processing. Both layers provide library-managed asynchronous buffering, and the cachea layer allows recently accessed parts of a file to be cached either in main memory or in a secondary data segment.

The number of buffers and the size of each buffer is tunable. In the bufa:bs:nbufs or cachea :bs:nbufs FFIO specifications, the bs argument specifies the size in 4096-byte blocks of each buffer. The nbufs argument specifies the number of buffers to use.

The cache Layer

The cache layer permits efficient file processing for repeated access to one or more regions of a file. It is a library-managed buffer cache that contains a tunable number of pages of tunable size.

To specify the cache layer, use the following option:

assign -F cache[:[bs][:[nbufs]]]

The bs argument specifies the size in 4096-byte blocks of each cache page; the default is 8. The nbufs argument specifies the number of cache pages to use. The default is 4. You can achieve improved I/O performance by using one or more of the following strategies:

  • Use a cache page size (bs) that is a multiple of the disk 4096-byte block or track size. This improves the performance when flushing and filling cache pages.

  • Use a cache page size that is a multiple of the user's record size. This ensures that no user record straddles two cache pages. If this is not possible or desirable, it is best to allocate a few additional cache pages (nbufs).

  • Use a number of cache pages that is greater than or equal to the number of file regions the code accesses at one time.

If the number of regions accessed within a file is known, the number of cache pages can be chosen first. To determine the cache page size, divide the amount of memory to be used by the number of cache pages. For example, suppose a program uses direct access to read 10 vectors from a file and then writes the sum to a different file:

integer VECTSIZE, NUMCHUNKS, CHUNKSIZE
parameter(VECTSIZE=1000*512)
parameter(NUMCHUNKS=100)
parameter(CHUNKSIZE=VECTSIZE/NUMCHUNKS)
real*8 a(CHUNKSIZE), sum(CHUNKSIZE)
open(11,access='direct',recl=CHUNKSIZE*8)
call asnunit (2,'-s unblocked',ier)
open (2,form='unformatted')
do i = 1,NUMCHUNKS
  sum = 0.0
  do j = 1,10
    read(11,rec=(j-1)*NUMCHUNKS+i)a
    sum=sum+a
  enddo
  write(2) sum
enddo
end

If 4 Mbytes of memory are allocated for buffers for unit 11, 10 cache pages should be used, each of the following size:

4MB/10 = 400000 bytes = 97 4096-byte blocks

Make the buffer size an even multiple of the record length of 40960 bytes by rounding it up to 100 4096-byte blocks (= 40960 bytes), then use the following assign command:

assign -F cache:100:10 u:11