Chapter 3. Creating a Fileset and Generating a Database

This chapter describes the fileset concept. A fileset is a file that contains the names of the files you want included in the analysis. You also specify whether these files are to be analyzed by the faster scanner mode or the slower, more thorough, parser mode.

Before you can perform any static analysis queries, you need to specify the source code files to be analyzed and then generate a database containing the static analysis information. This chapter covers the following topics:

Fileset Specifications

A Static Analyzer fileset is a single file used to specify the source code files to be analyzed. There are several methods for creating a fileset:

  • Using the Fileset Editor

  • Creating a file manually

  • Letting cvstatic do it automatically at startup by defaulting to those files in the current directory that match the expression *.[c|C|f|F]

  • Letting cvstatic do it automatically at startup by designating an executable file

  • Using the compiler to create a fileset (and database) by adding the -sa,dbdirectory option to your Makefile


Caution: Information in this section will not work in all cases. You should be aware of the following limitations:

  • For C and C++ files, the only set of compiler options that works is the following, where cvstatic.fileset is the name of the fileset if you do not use -sa_fs ( cc -o32 rejects the -sa option):

    CC -o32 -sa [-sa_fs | cvstatic.fileset]

  • CC -n32 -sa and cc -n32 -sa both produce a fileset but do not produce a database.

  • The -sa flag should be added only to a Makefile that does a sequential build. Adding the -sa flag to a Makefile that does a parallel build causes multiple copies of cc or CC to try to write to the same database. However, the database accepts only one writer at a time.


A fileset is a regular ASCII file with a format of one entry per line, each line is separated from the next by a carriage return. The fileset always begins with the following line:

-cvstatic

The other entries can be a mixture of the following entities:

  • Regular expressions

  • File names

  • Included directories preceded by the -I dwsignator


Note: In parser mode only, an entry can be followed by the name of the compile driver, compilation options such as -ansi, and other user-specified options such as -D for defining macros (see “Parser Mode”).


Using Regular Expressions

Each line in the fileset can use shell expansion characters, a wild card system in standard use for specifying file names in UNIX shells. If you enter a standard pathname (either absolute or relative), the Static Analyzer reads the line literally and looks for the file. If you use metacharacters such as brackets ([]) and asterisks (*), you can specify a number of files with a single line of text. For example, the default fileset contains the single line:

*.[c|C|f|F]

The asterisk specifies any number of characters (zero or greater) before a period, and the bracketed set of characters specifies any of following single characters: c, C, f, or F, after the period. The result is that the line specifies any file names in the current directory that use one of these extensions.

Do not confuse the shell expansion characters used here with the regular expressions used in the Fileset Selection Browser window; they are different systems.

Specifying Pathnames

The Static Analyzer resolves absolute pathnames in the fileset from the root; it resolves relative pathnames from the directory in which you invoke the Static Analyzer, referred to as the browsing directory. Anytime you change to a fileset in another directory, however, the Static Analyzer changes the working directory to match so that any relative filenames in the fileset are resolved from the fileset's own directory.

Specifying Included Files

Besides specifying file names, the fileset also can also specify directories to search for included files. The default search files are the current directory and /usr/include. Any additional search paths are specified with the prefix -I followed immediately (without a space) by the pathname. For example:

-I/usr/include/gl

This pathname listed in a fileset requests that the Static Analyzer to search through /usr/include/gl for include files.

Filesets created by the Static Analyzer are named cvstatic.fileset by default. If you create your own filesets, you can give them any name you want, but by convention you should use the .fileset extension.

Defining Macros in the Fileset

The Static Analyzer lets you define macros to be included in the database. When you compile with the -sa flag, the fileset is built with one file per line; lines may also contain a -I flag for including files, -D for defining macros, or -U for undefining macros. The Static Analyzer does not normally preprocess source code files before creating a cross-reference database. Some source code, however, requires preprocessing to resolve ifdef statements before you can successfully analyze the code.

The way to perform preprocessing is to specify these symbol names and values in the file cvstatic.fileset and then run cvstatic from the command line with the -preprocess flag. Macros are specified at the end of a fileset by appending a line in the following format for each preprocessor symbol you want to define:

-D symbolname 

or

-D symbolname=value

For example, to set the macros DEBUG and BUFFERSIZE, you would append two lines like the following to the end of the fileset:

-DDEBUG
-DBUFFERSIZE=8 

In a similar manner, -U undefines macros. These symbol definitions are used for processing all files in the fileset.


Note: Using the -preprocess option increases the scanning time tremendously (scanner mode only). Use it only when absolutely necessary.


Using the Default Fileset

When you start the Static Analyzer in a directory that does not contain a file named cvstatic.fileset, the Static Analyzer creates a default fileset and saves it as cvstatic.fileset. The contents of the fileset are:

*.[c|C|f|F]

This line specifies any C, C++, Fortran 77, or Fortran 90 files in the working directory.


Note: This line assumes that C++ files have a .C extension, which may not be the case for all C++ files because there is not yet a pervasive extension standard. If your C++ files use.c++, .cc, or other extensions and you want to use the default fileset, you should edit it to include the extensions you want.


Using the Fileset Editor

The Fileset Editor lets you edit the contents of a fileset. You invoke it by choosing Edit Fileset from the Admin menu. The contents of the current fileset appear in the two file lists on the right side of the window; directories and files that you can add to the fileset appear in the Directories and Files lists on the left.

The Current Fileset field at the top right of the window is a read-only display that shows the full pathname of the current fileset. The directory displayed here is the Static Analyzer's current working directory. You cannot change either the fileset or the working directory here; to do so, use the Change Fileset selection in the Admin menu.

Below the Current Fileset field, there are two list areas. A fileset can contain two kinds of files: those that are scanned into and those that are parsed into the database. (For a complete discussion of scanner and parser mode, see “Generating a Static Analyzer Database”.) The top list area shows files in the fileset to be parsed, and the lower area shows files to be scanned. Both list areas have vertical scroll bars to scroll through long lists and horizontal scroll bars to move left and right through long file names.

Adding Lines to the Fileset Contents List

Both fileset list areas have entry fields immediately below them that allow you to enter lines in the fileset. You put the pointer in the line entry field and type. When you press Enter, the Fileset Editor enters your line in the fileset.

The line entry field interprets each typed line as soon as you press Enter. If you enter a literal filename such as jello.c or ../bounce/bounce.C, that filename appears in the fileset list when you press Enter. If you enter a wild card entry such as *.*, the Fileset Editor interprets it, resolving from the working directory, and places those filenames that match (not the wild card entry itself) in the fileset list.

If you want to enter a wild card entry in the fileset without having it immediately interpreted and replaced with actual filenames, turn on the toggle button just below the line entry area. When this button is on, the Fileset Editor treats all strings you enter literally; it does not interpret them as shell expansion characters, which allows you to place wild card lines directly into the fileset. The Static Analyzer interprets these strings later when you query the fileset.Literal Input

Removing Lines from the Fileset Lists

To remove a line from a fileset list, click on it to select it and then click the Remove button below the lists. The Fileset Editor removes the line from the list. To remove more than one line at a time, drag the cursor over a range of files or hold down the Control key while clicking, then click the Remove button.

Browsing for Fileset Contents

You can use the following lists and buttons on the left side of the Fileset Editor window to browse through available directories for files to add to the fileset.

Directories List

The Directories list shows the subdirectories available in the current directory. You can double-click on a subdirectory to move to that directory and see its subdirectories in the Directories list. The .. entry is the parent directory of the current directory. Double-click it to move up a directory.

Browsing Directory

The Browsing Directory field just above the Directories list shows the current directory in which you are browsing. You can use it to type an absolute pathname to a new directory. First, put the pointer in the area to type and then press Enter. The contents of the Directories list changes to show the subdirectories of the directory you entered.

Language Filters

The Files list below the Directories list shows the files contained in the current directory. You can filter the contents you see there by turning on any or all of the language filter buttons below the list. If none of these buttons is turned on, the Files list shows all files in the current directory. Turning on any single button restricts files listed to C, C++, or Fortran files:

  • The C button restricts files shown to those with .c extensions.

  • The C++ button restricts files shown to those with .C, .cc, or .cxx extensions.

  • The Fortran button restricts files shown to those with .f or .F extensions.

You can set combinations of these buttons to see different source code file types.

Adding File Names from Lists

If you want to add one or more file names from the Files list to one of the fileset lists, select the file name and click the Move Files Parser button or Scanner button to the right of the Files list depending on how you want information extracted from the file. The Fileset Editor puts the absolute pathname of each file in the fileset list.

To add all the files in a directory to the Fileset Contents list, select the directory name (or directory names if you want more than one) in the Directories list, then click either the Parser button or Scanner button to the right of the Directories list. The Fileset Editor (in its default state) adds only the files contained in that directory and not files contained within any of its subdirectories.

To add files contained within a directory's subdirectories, turn on the Include Subdirectories button. When you click on the Add Directories button with this button turned on, the Fileset Editor adds all files in directories, subdirectories, and so on, to the fileset lists.

You can specify the kinds of files the Fileset Editor puts in the Parser Fileset and Scanner Fileset lists when you click the Add Directories button. To do so, turn on any of the filter buttons below the Files list.

Transferring Files in the Fileset between Modes

The Fileset Editor lets you change the method of data extraction (parser or scanner) for files in the fileset. You do this by transferring them from one fileset list to the other using the two Transfer Files arrows. This is particularly useful when you discover that a file cannot be parsed. You can then transfer the file to scanner mode, which is not sensitive to programming languages.

Leaving the Fileset Editor Window

You can close the Fileset Editor window by clicking the OK button or the Cancel button. Click OK to put all the fileset changes you made into effect. Click the Cancel button to close the window and return the fileset to the state it was in when you first opened the Fileset Editor window; your editing changes are ignored.

Creating a Fileset Manually

You can create a fileset, either by using a text editor that saves text in a text-only format (vi, for example) or by using the output of UNIX commands that return filenames. You may find the UNIX find(1) command useful for returning all specified filenames within a directory tree. For example, the following command creates a fileset of all Fortran 77 files (those with a .f extension) found within the current directory and all of its subdirectories:

% find . -name “*.f” -print > cvstatic.fileset

You can pipe the output of the find(1) command through filtering commands such as sed(1) to further modify the fileset created. For example, the following command finds C files within a directory tree and strips out any .c files left by the C++ compiler:

% find . -name "*.c" -print | sed'/\.\.c/d > cvstatic.fileset

Using Command-Line Options to Create and Use a Fileset

The Static Analyzer provides the following special options when you invoke cvstatic from the command line:

  • The -executable option followed by the file name of an executable file instructs the Static Analyzer to create a fileset that contains the absolute pathname of every file used to compile that executable. For example, entering the following command finds C files within a directory tree and strips out any .c files left by the C++ compiler:

    % cvstatic -executable jello

    The executable file must not be stripped because stripped files do not contain the names of their source files. When using the -executable option, it is a good idea to use the Fileset Editor to exclude files with incomplete names that can occur with files compiled into lib using compilers prior to 4.0.1 or nonsupported languages like Assembler or Pascal. The -executable option requires that the executable file be built on the same system as that performing the static analysis.

  • The -fileset option followed by the file name of a fileset instructs the Static Analyzer to start using a fileset other than cvstatic.fileset.

  • The -mode flag takes the options SCANNER or COMPILER to indicate the types of files in the fileset to be used in queries. If you do not use the -mode flag, then scanner mode will be assumed for those files in the fileset without compiler driver specifications.

Generating a Static Analyzer Database

The most time-consuming part of the static analysis process is creating the database, which is a collection of symbols and their relationships. The following two methods are available for extracting static analysis data from a fileset:

  • Scanner mode, which is fast but not sensitive to the characteristics of specific programming languages

  • Parser mode, which is language-specific and thus more thorough

If you need a mix of accuracy and speed, you can combine the two modes by flagging the files in the fileset according to mode and building the database with the -mode BOTH flag. You might use this approach if some files cannot be compiled or if scanner mode is misinterpreting necessary symbols.

Scanner Mode

The quickest way to build a database is to use scanner mode. Since scanner mode is not sensitive to the characteristics of specific programming languages, it may miss or incorrectly parse certain symbols (especially in Fortran). If you are analyzing a large quantity of source code, do not care about minor inaccuracies, and do not need the language-specific relationships (such as C types) available in parser mode, then use scanner mode.

Scanner mode is the default method for building a static analysis database. It is run automatically whenever you create a new fileset or perform a rescan, unless you explicitly specify parser mode.

Scanner mode creates files named cvstatic.fileset, cvstatic.index, cvstatic.posting, and cvstatic.xref in the directory in which it is started. These files comprise the Static Analyzer database for the program.

If the Static Analyzer finds cross-reference files to accompany a fileset, it determines when they were last updated. It then scans the fileset to see which files have been modified or added since that date. The Static Analyzer updates the cross-reference files with cross-references found in modified or added files.

Scanner mode is based on a sophisticated pattern matcher. It works by searching for and identifying common patterns that occur in programs. Both philosophically, and in terms of the actual implementation, cvstatic(blank) is most closely related to the grep(1) command. If you expect cvstatic to produce the type of results that can be accomplished only with a full-compilation type of analysis, you should use the compiler-based parser mode. If you think of scanner mode as a sort of “super grep” command and use scanner mode as most programmers use the grep command to explore a new program, you can get a quick, high-level look at your code.

Parser Mode

Parser mode is language-specific and slower as a result. Use parser mode when you need to stress accuracy over speed. Parser mode provides relationship data specific to the programming languages C, C++, and Fortran 77 such as querying on types, directories, and Fortran common blocks. Parser mode uses the compiler to identify entities in the source code, so you must be able to compile a file in order for it to be parsed. If a source file cannot compile, then you need to flag that file for scanning and run it through scanner mode.

Preparing the Fileset for Parser Mode

File entries for parser mode take the following general form:

/fullpath/sourcefile drivername options

where:

  • drivername refers to the compiler driver and can be f77 for Fortran, ncc for the Edison C compiler, NCC for the standard C++ compiler, or DCC for the Delta C++ compiler. Note that these are outmoded compilers and may not be available on your system.

  • options lets you choose language level (-ansi, -cckr, -xansi , or -ansiposix) and user-specified options such as -I for including files, -D for defining macros, -nostd, and +p.

The Static Analyzer recognizes the type of language by the file extension. Parser mode assumes that C files are ANSI unless otherwise specified in the Makefile.

Before processing the files, the Static Analyzer must know where to look for include files. If you are using parser mode, you need to set the include paths before the Static Analyzer scans the files, so do this before performing any queries or choosing Force Scan from the Admin menu.

Invoking the Parser

There are three methods for creating a fileset with parser mode files:

  • Enter the files in the parser mode fileset list in the Fileset Editor window.

  • Edit the cvstatic.fileset file directly, specifying the compiler and other options after the file entry.

  • Use the compiler to generate the fileset by specifying the -sa[,databasedirectory] and the -nocode flags. Without arguments, the -sa flag stores the static analysis database in the current directory. If you enter a comma (,) and a database directory name, the static analysis database will be stored in the specified directory. If you specify the -nocode flag, the database will be built without creating new object files.


Caution: Information in this section will not work in all cases. You should be aware of the following limitations:

  • For C and C++ files, the only set of compiler options that works is the following, where cvstatic.fileset is the name of the fileset if you do not use -sa_fs ( cc -o32 rejects the -sa option):

    CC -o32 -sa [-sa_fs | cvstatic.fileset]

  • CC -n32 -sa and cc -n32 -sa both produce a fileset but do not produce a database.

  • The -sa flag should be added only to a Makefile that does a sequential build. Adding the -sa flag to a Makefile that does a parallel build causes multiple copies of cc or CC to try to write to the same database. However, the database accepts only one writer at a time.


While the database is being built, a window appears displaying any messages from the parsing process. This helps you find problems if there is code that cannot compile.

Parser mode creates a cvstatic.fileset file and new files named cvdb*.dat, cvdb*.key, vista.taf, and cvdb.dbd in the current directory. In parser mode, Force Scan rebuilds the database. Rescan looks at the time stamps of files in the database and rebuilds pieces only when they are out-of-date.

For more information on creating a database in parser mode, see “Using the Compiler to Create a Static Analysis Database” in Chapter 2.

Parser Mode Shortcuts

If you want to use parser mode but want to avoid waiting for the process to finish, there are two ways to speed up processing:

  • You can use the compiler with the -nocode flag to skip creating object files.

  • You can build the Static Analyzer database using the compiler and bring up the graphic user interface later to read this database.

Size Limitations

The following limitations and shortcomings are largely a consequence of the grep(1)-like model supported by scanner mode. Still, cvstatic does provide a more powerful way to approach understanding a set of source files than using the grep(1) command.

When you use the Fileset Editor to add entire directories of files, you cannot enter more than 10,000 files. This limit exists to prevent someone from inadvertently starting at the root of a file system and trying to add all files. Note that there is no limitation on the number of files that can be added to the fileset when the fileset file is constructed in other ways, such as compiling source files with the -sa flag, or emitting a fileset from a Makefile rule.

The Static Analyzer displays a maximum of 20,000 lines of unfiltered results from a query in the Text View window. Larger results can, however, be saved to a file or reduced to a more manageable size by using the Results Filter.

The Static Analyzer displays no more than 5,000 functions in the Call Tree View, 10,000 files in the File Dependency View, or 10,000 classes in the Class Tree View. These are absolute maximum limits, and the actual limits may be much lower depending on characteristics of the graph being displayed. In particular, all graph views are displayed in a scrolled X Window System window, which is sized to accommodate the graph. The X Window System imposes a maximum size on windows that graphs cannot exceed. To get around this limitation, you can use one of the following methods:

  • Use more specific queries to focus on the part of the program that is of the most interest.

  • Reduce the scale used to view the graph.

  • Use the Results Filter to trim query results.

  • Use the Incremental Mode setting in the various graph views or the pop-up menus on nodes of the graph to follow a specific path through a large tree.

Rescanning the Fileset

After you have generated a database, you can always go back and rescan the fileset. The Admin menu provides two selections for this purpose:

  • Rescan: asks the Static Analyzer to check for new or modified files since the last scan and to store any cross-references found in new and modified files in the database. Use this command anytime you have modified source code files during a Static Analyzer session and you want to ensure that the Static Analyzer reflects those changes in the cross-reference files.

  • Force Scan: asks the Static Analyzer to completely rebuild the cross-reference files, creating a cross-reference database of all files specified in the fileset, whether or not they've been modified since the last scan. Force Scan also returns the Static Analyzer to its initial startup state with no query results in the main window and no past queries stored in the History menu. Use this command to restart the Static Analyzer and to verify the integrity of its cross-reference files.

There are also two command-line options involved with rescanning the fileset:

  • -batch: asks the Static Analyzer to perform the equivalent of the Rescan selection; it updates the cross-reference files to accommodate new and modified files in the fileset. It does not open the Static Analyzer's main window, however, and it quits the Static Analyzer after the scan is finished. You can use the -batch option to update cross-reference files for a large set of source code files, using the Static Analyzer as a background process. Note that you must have a fileset in the directory where you start the Static Analyzer or that you must specify a fileset when you start the Static Analyzer, or this option will not work.

  • -noindex: stops creation of the .index and .posting files. Therefore, the Static Analyzer does not create an inverted index for the cross-reference database. This speeds database creation but slows database query response.


    Note: This works in scanner mode only.


Setting the Search Path for Included Files

Whenever the Static Analyzer scans a fileset and finds an included file in source code, it searches by default for the file in the current directory and then in /usr/include. If it does not find the included file in either of these directories, it displays a Not Found dialog box that shows the names of those included files listed but not found in its search path.

To add directories to the search path for included files, choose Set Include Path and Flags from the Admin menu to open the Scanning Options dialog box.

The Include Directories list at the top of the box lists all directories that the Static Analyzer searches in addition to the default search path. To add a directory to the list, move the pointer to the Directory field below the list, type in a directory name, then press the Enter key (or click on the Add Directory button). The path should be relative to the directory in which cvstatic is running. To delete a directory, click its name in the Include Directories list (this puts it in the Directories field), then click the Remove Directory button. You can also add flags such as -I for including files, -D for defining macros, or -U for undefining macros, as described in “Defining Macros in the Fileset”.

To exclude /usr/include from the Static Analyzer's search path, click the No Standard Includes button to turn on the option. Turn on this option whenever you do not want to scan standard libraries and headers into a .xref file. By eliminating these files from a scan, you can greatly reduce the amount of data the Static Analyzer handles, increase its speed, and concentrate query results on your custom code. However, you will not be able to find data in the header files normally found in /usr/include.

To close the Scanning Options dialog box, click the Close button. Any directories you added to the search path are stored as part of the fileset. You will not see the directories listed if you open the Fileset Editor window, but you will see them if you examine the fileset file directly because each added search directory appears in a separate line with a -I prefix.

Changing to a New Fileset and Working Directory

The Static Analyzer uses only one fileset at a time, and resolves each relative pathname and general line from its current working directory. To change to a new fileset or a new working directory, use the Fileset Selection Browser window shown in Figure 3-1 by choosing Change Fileset from the Admin menu.

Figure 3-1. The Fileset Selection Browser Window

The Fileset Selection
Browser  Window

To load a new fileset, change to the directory in which the fileset is located by using the File Selection field (either by dragging a folder icon into it or by typing directly). Then select the fileset in the Files list. Once you change to a new fileset, the directory where it is located becomes the new working directory.

You can use the File Selection field of the Fileset Selection Browser window to create a new fileset from within the Static Analyzer. If you enter a new filename such as custom.fileset in the File Selection field (as part of a full pathname) and then click OK to accept your new fileset, the Static Analyzer creates a file by that name and saves any fileset edits you make to that file.