Appendix E. Programming Methods for Real-Time Digital Media Recording and Playback

This appendix explains the following real-time disk I/O concepts:

The example source for the utilities discussed in this appendix can be found in /usr/share/src/dmedia/tools. The code examples are written to Digital Media buffers (DMbuffers), a real-time data transport facility. See the Digital Media Programming Guide (document number 007-1799-060 or later, hereafter referred to as the DMPG) for more details. The emphasis here is not on how data is acquired from or transported to the video device, but rather on how data is moved to disk in real time.

The DMPG covers basic digital media programming concepts; two simple programming examples in /usr/share/src/dmedia/video/DIVO, divo_vidtomem.c and divo_memtovid.c, illustrate how video data is copied into and out of the DMbuffers for the simpler non-real-time case. At an abstract level, high-bandwidth throughput is simple; the work is in the details, as explained in this appendix.

 Direct I/O

The most efficient way to move data on and off a disk device is to use the XFS filesystem with direct I/O mode and large data transfer sizes. If large transfer sizes cannot be achieved, you can combine memory pages from noncontiguous locations using writev(2) or readv(2) . Finally, you can use asynchronous I/O to queue multiple I/O requests to the kernel without waiting for blocked calls to return. Other real-time software features and products, such as REACT, can be used to assure low-latency interrupts and high-priority scheduling, but are not absolutely necessary for digital media applications.

Normally, when a disk file is opened with no status flags specified, a call to write(2) for that file returns as soon as the data has been copied to a buffer managed by the device driver (see open(2) ). The actual disk write may not take place until considerable time has passed. A common pool of disk buffers is used for all disk files.

Disk buffering is integrated with the virtual memory paging mechanism. A daemon executes periodically and initiates output of buffered blocks according to the age of the data and the needs of the system. You can force the writing of all pending output for a file by calling fsync(2) or by opening the file and specifying the O_SYNC flag. However, the process blocks until the data has been written to disk, and all output data must still be copied from the buffer in the user address space to a buffer in the kernel address space. See Chapter 8, “Optimizing Disk I/O for a Real-Time Program,” in the REACT Real Time Programmer's Guide for details.

If you use the O_DIRECT flag, writes to the file take place directly from your program's buffer, and the data is not copied to a buffer in the kernel first. Because the filesystem cache is bypassed, your application must manage buffer alignment and block size specification. To use O_DIRECT, you must transfer data in quantities that are multiples of the filesystem block size. The following code shows how to query the filesystem block size and system DMA transfer size limit.

struct dioattr da;
struct stat fileStat;
char *ioFileName = “videodata”;
int ioBlockSize, ioMaxXferSize;

ioFileFD = open(ioFileName,O_DIRECT | O_RDWR | O_CREAT | O_TRUNC,0644);
if (ioFileFD < 0)
	    return(DM_FAILURE);
if (fcntl(ioFileFD, F_DIOINFO, &da) < 0)
    return(DM_FAILURE);
ioBlockSize = da.d_miniosz;
ioMaxXferSize = da.d_maxiosz;

The two important constraints of direct I/O with XFS are memory address alignment and buffer length. Direct I/O requires all memory addresses to be page-aligned. XFS requires buffers to be allocated as a multiple of the filesystem block size, ioBlockSize. DMbuffers are guaranteed to be page-aligned, but to ensure that the buffers are properly padded, you must set the buffer size, bytesPerXfer, to the size of the image data you will transfer rounded up to the nearest multiple of ioBlockSize.

VLServer vlServer;
VLPath vlPath;
DMparams * paramsList;
int dmBufferPoolSize = 30;  /* 1 second of video */
int vlBytesPerImage = vlGetTransferSize(vlServer, vlPath);
int ioBlocksPerImage = (vlBytesPerImage+ioBlockSize - 1) / ioBlockSize;
int bytesPerXfer = ioBlocksPerImage * ioBlockSize;
if (dmBufferSetPoolDefaults(paramsList,dmBufferPoolSize,bytesPerXfer,
    DM_TRUE, DM_TRUE) == DM_FAILURE) {
    fprintf(stderr, “error setting pool defaults\n”);
    return(DM_FAILURE);
} 

All SGI systems have a configurable maximum DMA transfer size (see systune(1M) ). This value should be compared with the user's I/O request size.

if (bytesPerXfer > ioMaxXferSize) {
    fprintf(“DMA request size is too small. Reconfigure with
             systune()\n”);
    return(DM_FAILURE);
} 

Scatter/Gather I/O

As shown in DMPG Chapter 5, “Digital Media Buffers,” and in the example programs divo_vidtomem.c and divo_memtovid.c, video data is generally transported to or from DMbuffers one image at a time using standard write and read functions that specify the number of bytes and a pointer to a buffer. However, large reads and writes can usually increase I/O performance. This technique reduces the number of transactions performed between the application, operating system, and I/O device, and can allow the device to optimize some of its activities. These advantages are particularly true with disk arrays.

Since the DMbuffer's memory pages are not guaranteed to be contiguous, standard reads or writes cannot be made across multiple buffers. The readv(2) and writev(2) interfaces allow an application to provide a list of I/O vectors, which are data structures consisting of an address and byte-count pair. Because the list of vectors is submitted to the operating system as a unit, it can be treated as a single large I/O request. Using readv() and writev() with direct I/O is particularly efficient.

The restrictions on buffer alignment and block size for readv()/writev() are similar to those of direct I/O. The address for each I/O vector must be page-aligned, but the length of each I/O vector must be a multiple of the system page size, rather than the filesystem block size, as is the case with direct I/O. Thus, the easy solution is to always use the larger of the two values, page size or filesystem block size. This requirement wastes some space, but is necessary to maintain functionality and performance. This calculation must be performed before dmBufferSetPoolDefaults(3dm) is called.

int ioAlignment, ioBlockSize; 
ioAlignment = getpagesize();
if (ioAlignment > ioBlockSize)
    ioBlockSize = ioAlignment;

The maximum allowable number of I/O vectors can be queried with sysconf(3C) .

int ioVecCount=2; /* set default to two images */
long ioVecCountMax;
/* check for range */
    ioVecCountMax = sysconf(_SC_IOV_MAX);
    if (ioVecCount > ioVecCountMax) {
        ioVecCount = ioVecCountMax;
        fprintf(stderr, “cannot create more than %d I/O vectors\n”,
            ioVecCountMax);
    }
    else if (ioVecCount <= 0)
        ioVecCount = 2;

The aggregate size of all the I/O vectors cannot exceed the maximum DMA transfer size, so you must check for this condition and adjust the number of I/O vectors if necessary:

int ioVecCount=2; /* set default to two images */
if (bytesPerXfer * ioVecCount > ioMaxXferSize)
   ioVecCount = ioMaxXferSize/bytesPerXfer;

When you work with video data using readv()/writev(), it is much easier to manage frames or an even number of fields with one I/O vector per field or frame. Most SGI video devices can support either field or frame mode, which is selected with the VL_CAPTURE_TYPE device control (see Chapter 4, “Video I/O Concepts” in the Digital Media Programming Guide). Hereafter, the term video image refers to a video data quantum: field or frame, depending on how the hardware is set up. The restriction of working on frame or even field boundaries is also relevant to the data file format, which is discussed at the end of this appendix.

The following code fragment illustrates writing to disk. Upon the successful capture of a video image, the VLTransferComplete event is placed on the event queue. A pointer to a valid DMbuffer is returned by vlDMBufferGetValid(3dm) ; then the actual video data is mapped into user space. Data is not written to disk until there are enough video images to complete an I/O vector.

case VLTransferComplete:

   /* loop until we get a valid buffer */
   while (((retval = vlDMBufferGetValid(vlServer, vlPath, vlDrnNode,
    &dmBuffers[dmbuffer_index])) != VLSuccess) && (vlErrno == VLAgain))
                        sginap(1);
    if (retval == VLSuccess) {
        /* map data to I/O vectors */
        (videoData+iov_index)->iov_base =
                dmBufferMapData(dmBuffers[dmbuffer_index]);
        (videoData+iov_index)->iov_len  = bytesPerXfer;

        /* increment the buffer index for the next image */
        dmbuffer_index = (dmbuffer_index+1) % dmBufferPoolSize;

        /* write data to disk when we have enough I/O vectors */
        if (!(++iov_index % ioVecCount)) {
            first_index = vlXferCount - iov_index + 1
            dataOffset = (off64_t) vlXferCount *
                          (off64_t) bytesPerXfer;
            /* seek to the correct position in the file, must use
             * lseek64() as the 64-bit offset value is necessary for
             * XFS filesystems larger than 2 gigabytes
             */
            if (lseek64(ioFileFD, dataOffset, SEEK_SET) != dataOffset)
                return(DM_FAILURE);

            /* write the I/O vector to disk */
            if (writev(ioFileFD, videoData, ioVecCount) < 0)
                return(DM_FAILURE);

            /* the dmbuffers are managed as a ring buffer,
             * dmbuffer_free_index points to the next free buffer */
            for (i=0, dmbuffer_free_index = (dmbuffer_index - 1);
                    i < iov_index; i++, dmbuffer_free_index--) {
                if (dmbuffer_free_index < 0)
                    dmbuffer_free_index = dmbuffer_max_index;

                dmBufferFree(dmBuffers[dmbuffer_free_index]);
            }

            /* write the QuickTime movie offset data */
            if (mvFormat == MV_FORMAT_QT) {
                last_index = first_index + iov_index;
                if (write_qt_offset_data() == DM_FAILURE)
                    return(DM_FAILURE);
            }

            /* reset the I/O vector index */
            iov_index = 0;
        }
        vlXferCount++;
    }
    else {
        fprintf(stderr, “cannot get a valid DM buffer: %s\n”,
                vlStrError(vlErrno));
    }
    break;

The example for reading data from disk can be found in /usr/share/src/dmedia/tools.

 Multiprocessing

Some aspects of digital media programming lend themselves to a multiprocessing programming model. On a multiprocessor system, the various tasks of moving multiple streams of video and audio data on and off disk, serial I/O control of external video equipment and input devices, processing of video data, or the transport of video data in and out of the graphics framebuffer can be assigned to different processors. New processes must be created with all virtual space attributes (shared memory, mapped files, data space) shared. The following fragment illustrates how to create a process to perform video recording.

if ((video_recorder_pid = sproc(video_recorder, PR_SADDR|PR_SFDS))<0){
    perror(“video_recorder”);
    exit(DM_FAILURE);
}

If you use multiprocessing, note the following caveats:

  • When VL calls are made, VL objects such as VLServer, VLPath, VLNode, and so on, are passed through the kernel to the video driver. However, you cannot create any VL objects without first creating a VLServer, from which everything else is instanced.

  • In a process share group, only one VL call whose arguments derive from a VLServer can execute at a time. This requirement applies even to VL calls that do not explicitly take a VLServer as an argument (for example, vlBufferAdvise(3dm) ).

  • You can use objects derived from a given VLServer in any number of threads as long as you use a locking scheme, such as usnewsema(3P) or pthread_mutex_init(3P) , to make the use in each thread mutually exclusive of a use in any of the other threads.

The VL error state, returned by vlGetErrno(3dm) , is global to a share group, not per VLServer. If a VL call using one VLServer in one thread executes simultaneously with a VL call using another VLServer in another thread, both calls try to set the error state returned by vlGetErrno(). This call should be global only to the thread, not to the entire process share group.

Asynchronous I/O

A synchronous I/O allows an application to process multiple read or write requests simultaneously. On SGI platforms, asynchronous I/O is available through the aio facility. This facility, based on sproc(2) 'ed processes, provides all of the benefits of multiprocessing for free. Because multiple I/O requests might be outstanding, when you use asynchronous I/O, the round-trip delay between making a request, having it serviced, and issuing another request is removed. Any process-scheduling delay between these steps is also eliminated.

Because asynchronous I/O operations complete out of sequence, the application must keep track of the order in which data appears in the DMbuffers. DMbuffers are contained in a DMbufferPool; the pool itself is unordered and buffers can be obtained and returned to the pool in any order. Ordering is achieved by a first-in-first-out queue and maintained only while the buffers reside in the queue. The application is free to impose any processing order once buffers are dequeued.

File Formats

Each time a DMbuffer is written to disk, an offset must be recorded for the QuickTime file.

MVid theMovie;
MVid mvImageTrack;
off64_t mvFieldGap= bytesPerXfer - vlBytesPerImage;
MVtimescale mvImageTimeScale=MV_IMAGE_TIME_SCALE_NTSC;
int mvFrameTime = 1001;  /* for NTSC */
off64_t meta_data_offset;
int mv_frame_index;
MVframe mv_dummy_offset;
int i;

mvInsertTrackDataAtOffset(
        mvImageTrack,
        1,
        (MVtime) (i * mvFrameTime),
        (MVtime)  mvFrameTime,
        mvImageTimeScale,
        (off64_t) meta_data_offset,
        vlBytesPerImage,
        MV_FRAMETYPE_KEY,
        0)
/* get the index for the libmovie data corresponding to this field.
 * this is necessary in order to set the gap and field sizes for the
 * fields in the frame.*/
mvGetTrackDataIndexAtTime(
        mvImageTrack,
        (MVtime) (i * mvFrameTime),
        mvImageTimeScale,
        &mv_frame_index,
        &mv_dummy_offset)

/* tell libmovie the field gap and sizes for each field in the frame */
mvSetTrackDataFieldInfo(
        mvImageTrack,
        mv_frame_index,
        vlBytesPerImage,      /* absolute size of field 1 */
        mvFieldGap,           /* gap between fields */
        vlBytesPerImage)      /* absolute size of field 2 */

When data recording completes, the following function must be called to close the QuickTime file properly.

write_qt_file_header(void)
{
    int flags;

    /* if direct I/O mode is enabled, disable it because the
     * movie library does not do direct I/O
     */
    if (ioFileFD) {
        fsync(ioFileFD);
        flags = fcntl(ioFileFD, F_GETFL);
        flags &= ~FDIRECT;
        if (fcntl(ioFileFD, F_SETFL, flags) < 0) {
            fprintf(stderr,“unable to reset direct I/O file status\n”);
            return(DM_FAILURE);
        }
    }

    if (mvClose(theMovie) == DM_FAILURE) {
        fprintf(stderr, “unable to write movie file header %s\n”,
                    mvGetErrorStr(mvGetErrno()));
        return(DM_FAILURE);
    }
}