Chapter 11. Foreign File Conversion

Chapter 11. Foreign File Conversion
Prev		Next

This chapter contains information about data conversion, a discussion about moving data between machines, and information about the working of implicit and explicit data conversion. It also explains the support provided for reading and writing files in foreign formats, including the record blocking and numeric and character conversion.

These routines convert data (primarily floating-point data, but also integer and character, as well as Fortran complex and logical data) from your system's native representation to a foreign representation, and vice versa.

Conversion Overview

Data can be transferred between computer systems in several ways. Several formats are supported. For each foreign file type, several supported file and record formats exist or explicit or implicit data conversion can also be used.

When processing foreign data, you must consider the interactions between the data formats and the chosen method of data transfer. This section describes, in broad terms, the techniques available to do these data conversions.

Explicit data conversion is the process by which the user performs calls to subroutines that convert the native data to and from the foreign data formats. These routines are provided for many data formats. This is discussed in more detail in “Explicit Data Item Conversion”.

Implicit data conversion is the process by which users declare that a particular file contains foreign data and/or record blocking and then request that the run-time library perform appropriate transformations on the data to make it useful to the program at I/O time. This method of record and/or data format conversion requires changes in command scripts. This is discussed in more detail in “Implicit Data Item Conversion”.

Using `fdcp` to Transfer Files

The fdcp (1) command can handle data that is not a simple disk-resident byte stream. The fdcp command assumes that both the data and any record, including EOF records, can be copied from one file to another. Record structures can be preserved or removed. EOF records can be preserved either as EOF records in the output file or used to separate the delimited data in the input file into separate files.

The fdcp command does not perform data conversion; the only transformations done are on the record and file structures (fdcp transforms block, record, and file control words from one format to another).

If no assign(1) information is available for a file, the system layer is used. This means that if the file being accessed is on disk and if no assign -F attribute is used, the syscall layer is used; if it is on a tape, the bmx layer is used. Therefore, each tape block is considered a record; user tape marks are mapped to EOF.

Data Item Conversion

Both implicit and explicit conversion of data items are provided. Explicit conversion means that the user's code must invoke the routines that convert between native systems and foreign representations.

Options to the assign(1) command control implicit conversion. The data types in the Fortran I/O lists direct implicit conversion. Implicit conversion is usually transparent to users and is available only to Fortran programmers. The following sections describe these data conversion types and provide direction in choosing a conversion type.

Explicit Data Item Conversion

The Fortran library contains a set of subroutines that convert between data formats of various vendors. These routines are callable from any supported programming language. For complete details, see the individual man pages for each routine. These subroutines provide an efficient way to convert data that was read into system central memory.

The following table lists these conversion routines.

Table 11-1. Available conversion routines

Non-IEEE	`CRY2MIPS`	`MIPS2CRY`
IEEE Fortran conversion	`IEG2MIPS`	`MIPS2IEG`
VAX Fortran conversion	`VAX2MIPS`	`MIPS2VAX`

See the individual man pages for details about the syntax and arguments for each routine.

Implicit Data Item Conversion

Implicit data conversion in Fortran requires no explicit action by the program to convert the data in the I/O stream other than using the assign command to instruct the libraries to perform conversion. For details, see the assign(1) man page.

The implicit data conversion process is performed in two steps:

Record format conversion
Data conversion

Record format conversion interprets or converts the internal record blocking structures in the data stream to gain record-level access to the data. The data contained in the records can then be converted.

Using implicit conversion, you can select record blocking or deblocking alone, or you can request that the data items be converted automatically. When enabled, record format conversion and data item conversion occur transparently and simultaneously. Changes are usually not required in your Fortran code.

To enable conversion of foreign record formats, specify the appropriate record type with the assign -F command. The -N (numeric conversion) and -C (character conversion) assign options control conversion of data contained in a record. If -F is specified, but -N and -C are not, the libraries interpret the record format, but they do not convert data. You can obtain information about the type of data that will be converted (and, therefore, the type of conversion that will be performed) from the Fortran I/O list.

If -N is used and -C is not, an appropriate character conversion type is selected by default, as shown in the following table.

Table 11-2. Conversion types

`-N` option	`-C` default	Meaning
none	none	No data conversion
default	default	No data conversion
`cray`	ASCII	Non-IEEE data conversion
`mips`	ASCII	No data conversion
`user`	ASCII	User defined data conversion
`site`	ASCII	Site defined data conversion
`ieee`	ASCII	Generic 32-bit IEEE data conversion
`ieee_32`		(alias for above)
`ieee_64`	ASCII	Cray 64-bit IEEE data conversion
`ieee_le`	ASCII	Little-endian 32-bit IEEE data conversion
`vax`	ASCII	DEC VAX/VMS data conversion
`vms`		(alias for above)

Supported implicit data conversion includes conversion of the supported tape and disk formats and data types through standard Fortran formatted, unformatted list-directed, and Namelist I/O and through BUFFER IN and BUFFER OUT statements. Generally, read, write, and rewind are supported for all record formats.

If you select the -N option, the libraries perform data conversion for Fortran unformatted statements and BUFFER IN and BUFFER OUT I/O statements. Data is converted according to its Fortran data type. Table 11-3 describes the conversion performed for each of the conversion types.

For numeric data conversions, most foreign data elements are defined with fewer bits than their corresponding native data elements. If the value in a native element is too large to fit in the foreign element, the foreign element is set to the largest or smallest possible value; no error is generated. When converting from a native element to a smaller foreign element, precision is also lost due to truncation of the floating-point mantissa.

If the assign -N user or assign -N site command is specified, the user or site must provide site numeric data conversion routines. They follow the same calling conventions as the other explicit routines.

Table 11-3. Supported foreign I/O formats and default data types

Vendor data type	Record formats	Foreign data types	Native data types
IBM	`U`, `F`, `FB`, `V`, `VB`, `VBS`	`INTEGER2` `INTEGER4` `DOUBLE PRECISION` `COMPLEX4` `LOGICAL4` `CHARACTER` (EBCDIC)	`INTEGER(24/32)` `INTEGER(64)` `DOUBLE PRECISION` `COMPLEX` `LOGICAL` `CHARACTER` (ASCII)
VMS	`F`, `V`, `S` for tape; `bb` or `disk` and `tr` types	`INTEGER2` `INTEGER4` `REAL4` `DOUBLE PRECISION` `COMPLEX4` `LOGICAL*4` `CHARACTER` (ASCII)	`INTEGER(24/32)` `INTEGER(64)` `REAL(64)` `DOUBLE PRECISION` `COMPLEX` `LOGICAL` `CHARACTER` (ASCII)
CDC (60 bit)	Subtype: `DISK`, `I`, `SI` Block record: `IW`, `CW`, `CZ`, `CS`	`INTEGER` `REAL` `DOUBLE PRECISION` `COMPLEX` `LOGICAL` `CHARACTER` (display code)	`INTEGER` `REAL` `DOUBLE PRECISION` `COMPLEX` `LOGICAL` `CHARACTER` (ASCII)
CDC NOS/VE	`F`, `S`, `V`	`INTEGER` `REAL` `DOUBLE PRECISION` `COMPLEX` `LOGICAL` `CHARACTER`	`INTEGER` `REAL` `DOUBLE PRECISION` `COMPLEX` `LOGICAL` `CHARACTER` (ASCII)
CDC/ETA CYBER205	`W` type	`INTEGER` `REAL` `REAL*4` `DOUBLE PRECISION` `COMPLEX` `LOGICAL` `CHARACTER` (display code)	`INTEGER` `REAL` `INTEGER(24/32)` (See Note 1) `DOUBLE PRECISION` `COMPLEX` `LOGICAL` `CHARACTER` (ASCII)
IEEE	None defined (often f77)	`INTEGER2` (see Note 2) `INTEGER4` `REAL4` `DOUBLE PRECISION` `COMPLEX4` `LOGICAL*4` `CHARACTER` (ASCII)	`INTEGER(24/32)` `INTEGER(64)` `REAL(64)` `DOUBLE PRECISION` `COMPLEX` `LOGICAL` `CHARACTER` (ASCII)
ULTRIX	`f77.vax`	`INTEGER2` `INTEGER4` `REAL4` `DOUBLE PRECISION` `COMPLEX4` `LOGICAL*4` `CHARACTER` (ASCII)	`INTEGER(24/32)` `INTEGER(64)` `REAL(64)` (see Note 3) `DOUBLE PRECISION` `COMPLEX` `LOGICAL` `CHARACTER` (ASCII)
Note 1: The CYBER 205 half-precision type maps to the short integer (`INTEGER*2`) type

For implicit conversion, specify format characteristics on an assign command.

Files can be converted to one of the following:

A magnetic tape
A disk file
A file transferred from a front end with the station

When a Fortran I/O operation is performed on the file, the appropriate file format and data conversions are performed during the I/O operation. Data conversion is performed on each data item, based on the type of the Fortran variable in the I/O list.

For example, if the first read of a foreign format file is the following, the library interprets any blocking structures in the file that precede the first data record:

READ (10) INT,FLOAT1,FLOAT2

These vary depending on the file type and record format. The first 32 bits of data (in IBM format, for example) are extracted, sign-extended, and stored in the INT Fortran variable. The next 32 bits are extracted, converted to native floating-point format, and stored in the FLOAT1 Fortran variable.

The next 32 bits are extracted, converted, and stored into the FLOAT2 Fortran variable. The library then skips to the end of the foreign logical record. When writing from a native system to a foreign format (for example, if in the previous example WRITE(10) was used), precision is lost when converting from a 64-bit representation to 32-bit representation.

Choosing a Conversion Method

As with any software process, the various options for data conversion have advantages and disadvantages, which are discussed in this section. As a set, various data conversion options provide choices in methods of file processing for front-end systems. No one option is best for all applications.

Explicit Conversion

Explicit data conversion has some distinct advantages over using station software, including the following:

Direct control over data conversion is provided (including some options not available through implicit conversion).
Programmers can control the conversion, and they can do the conversion at a convenient and appropriate time.
Conversion is usually performed on large data areas as vector operations, increasing performance.

One disadvantage of using explicit conversion is that explicit routines require changes to the source code.

Implicit Conversion

An advantage when using implicit conversion is that you do not have to change the source code.

The following are disadvantages when using implicit conversion:

Job Control Language (JCL) or script changes are required on the assign(1) command.
Conversion is less efficient on a record-by-record basis.
Conversion is done at I/O time according to the declared data types, allowing little flexibility for nonstandard requirements.

Foreign Conversion Techniques

This section contains some tips and techniques for the following conversion types:

Conversion type		Convert data to/from
IEEE conversion		Various types of workstations and different vendors that support IEEE floating-point format
VAX/VMS conversion		DEC VAX machines that run MVS

Workstation and IEEE Conversion

IRIX systems use 32-bit IEEE standard floating point, as do many workstations and personal computers. These workstations often use a dialect of UNIX software as the operating system, with twos-complement arithmetic and the ASCII character set. The logical values in these implementations are usually the same for Fortran and C. They use zero for false and nonzero for true. It is also common to see the f77 record blocking used by the Fortran run-time library on unformatted sequential files.

No IEEE record format exists, but the IEEE implicit and explicit data conversion routine facilities are provided with the assumption that many of these things are true.

Most computer systems that use the IEEE data formats run operating systems based on UNIX software and use f77 record blocking. You can use the rcp or ftp commands to transfer files. In most cases, the following command should work for implicit conversion:

assign -F f77 -N ieee fort.1

When writing files in the f77 format, remember that you can gain a large performance boost by ensuring that the records being written fit in the working buffer of the f77 layer.

SGI MIPS systems use IEEE floating-point representation, so IEEE conversion is usually unnecessary when reading or writing IEEE data on these systems.

On MIPS systems, data types can be declared as 64-bits in size and can then be read or written directly. This is the most direct and efficient method to read or write data files for IEEE systems. The user can either alter the declarations of the variables used in the Fortran I/O list to declare them as KIND=8 or as REAL*8 (or INTEGER*8), or all the variables in the program can be resized by compiling with the -r8 (or -i8) compiler option.

The following are other IEEE data conversion variants; not all variants are available on all systems:

`ieee` or `ieee_32`		The default workstation conversion specification. Data sizes are based on 32-bit words.
`ieee_64`		Data sizes are based on 64-bit words.
`ieee_dp`		Data sizes are based on 32-bit words except for floating-point data which is based on 64-bit words.
`ieee_le` or `ultrix`		Data sizes are based on 32-bit words and are little-endian.
`ieee_le_dp` or `ultrix_dp`		Data sizes are based on 32-bit words except for floating-point data which is based on 64-bit words. All data is little-endian.
`mips`		Data sizes are based on 32-bit words except for 128-bit floating-point data which uses a "double double" format.

Prev	Table of Contents	Next
Chapter 10. Using FFIO		Chapter 12. I/O Optimization