Chapter 3. Static Analyzer: Creating a Fileset and Generating a Database

Before you can perform any static analysis queries, you need to specify the source code files to be analyzed in a file called a fileset and then generate a database containing the static analysis information. This chapter covers these topics:


Note: The features described in this chapter that apply to Ada are only available if you have purchased the ProDev Ada package.


Fileset Specifications

A Static Analyzer fileset is a single file used to specify the source code files to be analyzed. There are five methods for creating a fileset:

  • using the Fileset Editor

  • creating a file manually

  • letting cvstatic do it automatically at startup by defaulting to those files in the current directory that match the expression *.[cCfF]

  • letting cvstatic do it automatically at startup by designating an executable

  • using the compiler to create a fileset (and database) by adding the -sa,<dbdirectory> option to your makefile

A fileset is a regular ASCII file with a format of one entry per line, each line separated from the next by a carriage return. The first line of a fileset is always

-cvstatic

The other entries can be

  • regular expressions

  • filenames

  • included directories preceded by the designator -I


Note: In parser mode only, an entry can be followed by the name of the compile driver, compilation options such as -ansi, and other user-specified options such as -D for defining macros (see "Parser Mode").


Using Regular Expressions

Each line in the fileset can use shell expansion characters, a wild card system in standard use for specifying filenames in UNIX shells. If you enter a standard pathname (either absolute or relative), the Static Analyzer reads the line literally and looks for the file. If you use metacharacters such as brackets ([]) and asterisks (*), you can specify a number of files with a single line of text. For example, the default fileset contains the single line:

*.[cCfF]

The asterisk specifies any number of characters (zero or greater) before a period, and the bracketed set of characters specifies any of four single characters—c, C, f, or F—after the period. The result is that the line specifies any filenames in the current directory that use a .c, .C, .f, or .F extension.

If you are analyzing Ada files, then the default expression *.[cCfF] is not appropriate. You may wish to substitute an expression like *.adb.


Note: Don't confuse the shell expansion characters used here with the regular expressions used in the Fileset Selection Browser; they're completely different systems. If you want full information about shell expansion characters, you'll find them described in the reference (man) pages for csh.


Specifying Pathnames

The Static Analyzer resolves absolute pathnames in the fileset from the root; it resolves relative pathnames from the directory in which you invoke the Static Analyzer, referred to as the browsing directory. Anytime you change to a fileset in another directory, however, the Static Analyzer changes the working directory to match so that any relative filenames in the fileset are resolved from the fileset's own directory.

Specifying Included Files

Besides specifying filenames, the fileset also can also specify directories to search for included files. The default search files are the current directory and /usr/include. Any additional search paths are specified with the prefix -I followed immediately (without space) by the pathname. For example, the pathname

-I/usr/include/gl

listed in a fileset asks the Static Analyzer to search through /usr/include/gl for include files.

Filesets created by the Static Analyzer are named cvstatic.fileset by default. If you create your own filesets, you can give them any name you wish, but by convention you should use the .fileset extension.

Defining Symbols in the Fileset

The Static Analyzer lets you define macros to be included in the database. When you compile with the -sa flag, the fileset is built with one file per line; lines may also contain a -I flag for including files, -D for defining macros, or -U for undefining macros. The Static Analyzer doesn't normally preprocess source code files before creating a cross-reference database. Some source code, however, requires preprocessing to resolve ifdef statements before you can successfully analyze the code.

The way to perform preprocessing is to specify these symbol names and values in the file cvstatic.fileset and then run cvstatic from the command line with the -preprocess flag. The macros are specified at the end of the fileset by appending a line of the form

-D<symbolname> 

or

-D<symbolname>=<value>

for each preprocessor symbol you want to define. For example, to set the macros DEBUG and BUFFERSIZE, you would append two lines like this to the end of the fileset:

-DDEBUG
-DBUFFERSIZE=8 

In like manner, -U undefines macros. These symbol definitions are used for processing all of the files in the fileset.


Note: Using the -preprocess option increases the scanning time tremendously (scanner mode only). Use it only when absolutely necessary, and consider analyzing the code as is, including all the ifdefed sections.


Using the Default Fileset

When you start the Static Analyzer in a directory that doesn't contain a file named cvstatic.fileset, the Static Analyzer creates a default fileset and saves it as cvstatic.fileset. The contents of the fileset are:

*.[cCfF]

This line specifies any C, C++, or Fortran files in the working directory. Note that the line assumes that C++ files have a .C extension, which may not be the case for all C++ files because there isn't yet a pervasive extension standard. If your C++ files use .c++, .cc, or other extensions and you want to use the default fileset, you should edit it to include the extensions you want.

Using the Fileset Editor

The Fileset Editor window (see Figure 3-1) lets you edit the contents of a fileset. You invoke it by choosing "Edit Fileset..." from the Admin menu. The contents of the current fileset appear in the two file lists on the right side of the window; directories and files that you can add to the fileset appear in the Directories and Files lists on the left.

The Current Fileset field at the top right of the window is a read-only display that shows the full pathname of the current fileset. The directory displayed here is the Static Analyzer's current working directory. You can't change either the fileset or the working directory here; to do so, use the "Change Fileset..." selection in the Admin menu.

Below the Current Fileset field, there are two list areas. A fileset can contain two kinds of files: those that are scanned into and those that are parsed into the database. (For a complete discussion of scanner and parser mode, see "Generating a Static Analyzer Database.") The top list area shows the files in the fileset to be parsed, and the lower one shows the files to be scanned. Both list areas have vertical scroll bars to scroll through long lists and horizontal scroll bars to move left and right through long filenames.

To see an example of the Fileset Editor, refer to "Tutorial 1: Applying the Static Analyzer to Scanned Files."

Figure 3-1. The Fileset Editor Window

Figure 3-1 The Fileset Editor Window

Adding Lines to the Fileset Contents List

Both fileset list areas have direct entry fields immediately below them that allow you to enter lines in the fileset. You put the pointer in the line entry field and type. When you press <Enter>, the Fileset Editor enters your line in the fileset.

The line entry field interprets each typed line as soon as you press <Enter>. If you enter a literal filename such as jello.c or ../bounce/bounce.C, that filename appears in the fileset list when you press <Enter>. If you enter a wild card entry such as *.*, the Fileset Editor interprets it, resolving from the working directory, and places those filenames that match (not the wild card entry itself) in the fileset list.

If you want to enter a wild card entry in the fileset without having it immediately interpreted and replaced with actual filenames, turn on the Literal Input toggle button just below the line entry area. When this button is on, the Fileset Editor treats any strings you enter literally; it does not interpret them as shell expansion characters, which allows you to place wild card lines directly into the fileset. The Static Analyzer interprets these strings later when you query the fileset.

Removing Lines From the Fileset Lists

To remove a line from a fileset list, click to select it and then click the Remove button below the lists. The Fileset Editor removes the line from the list. To remove more than one line at a time, drag the cursor over a range of files or hold down the <Control> while clicking, then click the Remove button.

Browsing for Fileset Contents

You can use the lists and buttons on the left side of the Fileset Editor window to browse through available directories for files to add to the fileset.

Directories List

The Directories list shows the subdirectories available in the current directory; double-click a subdirectory to move to that directory and see its subdirectories in the Directories list. The ".." entry is the parent directory of the current directory; double-click it to move up a directory.

Browsing Directory

The Browsing Directory field just above the Directories list shows the current directory in which you're browsing. You can use it to type an absolute pathname to a new directory—put the pointer in the area to type. When you press <Enter>, the contents of the Directories list change to show the subdirectories of the directory you entered.

Language Filters

The Files list below the Directories list shows the files contained in the current directory. You can filter the contents you see there by turning on any or all of the three filter buttons below the list: the C button, the C++ button, or the Fortran button. If none of these buttons is turned on, the Files list shows all files in the current directory. Turning on any single button restricts files listed to Ada, C, C++, or Fortran files:

  • The C button restricts files shown to those with .c extensions.

  • The C++ button restricts files shown to those with .C, .cc, or .cxx extensions.

  • The Fortran button restricts files shown to those with .f or .F extensions.

  • The Ada button restricts files shown to those with .adb, .ali, .atb, .ads, and .ats extensions (with ProDev Ada package only).

You can set combinations of these buttons to see different source code file types.

Adding Filenames From Lists

If you wish to add one or more filenames from the Files list to one of the fileset lists, select the filename and click the Move FilesParser button or Scanner button to the right of the Files list depending on how you want information extracted from the file. The Fileset Editor puts the absolute pathname of each file in the fileset list.

To add all the files in a directory to the Fileset Contents list, click the directory name (or directory names if you want more than one) in the Directories list, then click either the Parser button or Scanner button to the right of the Directories list. The Fileset Editor (in its default state) adds only the files contained in that directory, and not files contained within any of its subdirectories.

To add files contained within a directory's subdirectories, turn on the Include Subdirectories button. When you click the Add Directories button with this button turned on, the Fileset Editor adds all files in directories, subdirectories, and so on, to the fileset lists.

You can specify the kinds of files the Fileset Editor puts in the Parser Fileset and Scanner Fileset lists when you click the Add Directories button. To do so, turn on any of the filter buttons below the Files list.

Transferring Files in the Fileset Between Modes

The Fileset Editor lets you change the method of data extraction (parser or scanner) for files in the fileset. You do this by transferring them from one fileset list to the other using the two Transfer Files arrows. This is particularly useful when you discover that a file cannot be parsed, as first thought; you then transfer it to the scanner mode, which is not sensitive to programming languages.

Leaving the Fileset Editor Window

You can close the Fileset Editor window by clicking the OK button or the Cancel button. Click OK to put all the fileset changes you made into effect. Click Cancel to close the window and return the fileset to the state it was in when you first opened the Fileset Editor. Your editing changes are ignored.

Creating a Fileset Manually

You can create a fileset by hand if you wish, either by using a text editor that saves text in a text-only format (vi, for example), or by using the output of UNIX commands that return filenames. You may find the UNIX find command particularly useful for returning all specified filenames within a directory tree. For example, the command

find . -name "*.f" -print > cvstatic.fileset

creates a fileset of all Fortran files (those with a .f extension) found within the current directory and all of its subdirectories. Note that all the pathnames in the fileset are relative, determined from the current directory.

You can pipe the output of the find command through filtering commands such as sed to further modify the fileset created. For example, the command

find . -name "*.c" -print | sed'/\.\.c/d > cvstatic.fileset

finds C files within a directory tree and strips out any .c files left by the C++ compiler.

Using Command-Line Options to Create and Use a Fileset

The Static Analyzer provides three special options when you invoke cvstatic from the command line:

  • The -executable option followed by the filename of an executable file asks the Static Analyzer to create a fileset that contains the absolute pathname of every file used to compile that executable. For example, entering

    cvstatic -executable jello

    while in the /usr/demos/CASEVision/jello directory starts the Static Analyzer and creates a fileset that includes all the files used to compile jello.

    Note that the executable must not be stripped; stripped files do not contain the names of their source files. When using the -executable option, it's a good idea to use the fileset editor to exclude files with "incomplete" names (which can occur with files compiled into lib using compilers prior to 4.0.1 or non-supported languages like assembler or Pascal). The -executable option requires that the executable be built on the same system performing the static analysis.

    Note also that this command-line option works only if you have the C, C++, or Fortran compiler that's shipped with IRIX[tm] version 4.0.1 or greater.

  • The -filesetoption followed by the filename of a fileset asks the Static Analyzer to start using a fileset other than cvstatic.fileset.

  • The -mode flag takes the options SCANNER or COMPILER to indicate the type(s) of files in the fileset to be used in queries. If you do not use the -mode flag, then scanner will be assumed for those files in the fileset without compiler driver specifications (see "Preparing the Fileset for Parser Mode").

Generating a Static Analyzer Database

The most time-consuming part of the static analysis process is creating the database, which is a collection of symbols and their relationships. There are two methods for extracting static analysis data from a fileset:

  • scanner mode, which is fast but not sensitive to the characteristics of specific programming languages

  • parser mode, which is language-specific and thus more thorough

If you need a mix of accuracy and speed, you can combine the two modes by flagging the files in the fileset according to mode and building the database with the -mode BOTH flag. You might use this approach if some files cannot be compiled or if scanner mode is misinterpreting necessary symbols.

Scanner Mode

The quickest way to build a database is to use scanner mode. Since scanner mode is not sensitive to the characteristics of specific programming languages, it may miss or incorrectly parse certain symbols (especially in Fortran). If you are analyzing a large quantity of source code, do not care about minor inaccuracies, and do not need the language-specific relationships (such as C types) available in parser mode, then use scanner mode.

Scanner mode is the default method for building a static analysis database. It is run automatically whenever you create a new fileset or perform a rescan, unless you explicitly specify parser mode.

Scanner mode creates files named cvstatic.fileset, cvstatic.index, cvstatic.posting, and cvstatic.xref in the directory in which it is started. These files comprise the Static Analyzer database for the program.

If the Static Analyzer finds cross-reference files to accompany a fileset, it determines when they were last updated. It then scans the fileset to see which files which have been modified or added since that date. The Static Analyzer updates the cross-reference files with cross-references found in modified or added files.

Scanner mode is based on a sophisticated pattern matcher. It works by searching for and identifying common patterns that occur in programs. Both philosophically, and in terms of the actual implementation, cvstatic is most closely related to the program grep. If you expect cvstatic to produce the type of results that can be accomplished only with a full-compilation type of analysis, you should use the compiler-based parser mode.

If you approach scanner mode as a "super-grep," using it as most programmers currently use grep (or various "tags" packages) to explore a new program, you can quickly get a quick high-level look at your code.

Parser Mode

Parser mode is language-specific and slower as a result. Use parser mode when you need to stress accuracy over speed. Parser mode provides relationship data specific to the programming languages C, Fortran, and C++, such as querying on types, directories, and Fortran common blocks. Parser mode uses the compiler to identify entities in the source code, so you must be able to compile a file in order for it to be parsed. If a source file cannot compile, then you need to flag that file for scanning and run it through scanner mode.


Note: The database generated by parser mode can also be used by the C++ Browser (it must be purchased separately).


Preparing the Fileset for Parser Mode

File entries for parser mode take the general form

/fullpath/sourcefile drivername options

where

drivername 


refers to the compiler driver and can be "f77" for Fortran, "ncc" for the Edison C compiler, "NCC" for the standard C++ compiler, or "DCC" for the Delta C++ compiler.

options 


lets you choose language level (-ansi, -cckr, -xansi, or -ansiposix) and user-specified options such as -I for including files, -D for defining macros, -nostd, and +p. See the man page for cc for more information.

The Static Analyzer recognizes the type of language by the file extension. ".c" extension is considered to be C. ".C" and ".cxx" are considered to be C++ files. Parser mode assumes that C files are ANSI unless otherwise specified in the makefile.

Before processing the files, the Static Analyzer must know where to look for include files. If you are using parser mode, you need to set the include paths before the Static Analyzer scans the files, so do this before performing any queries or selecting "Force Scan."

Invoking the Parser

There are three methods for creating a fileset with parser mode files:

  • Enter the files in the parser mode fileset list in the Fileset Editor.

  • Edit the cvstatic.fileset file directly, specifying the compiler and other options after the file entry.

  • Use the compiler to generate the fileset by specifying the flag -sa[,databasedirectory] and -nocode. Without arguments, the -sa flag stores the static analysis database in the current directory. If you enter a comma (,) and a directory, the static analysis database will be stored in the specified directory. If you specify the flag -nocode, then the database will be built without creating new object files.

While the database is being built, a window appears, displaying any messages from the parsing process. This helps you find problems if there is code that cannot compile.

Parser mode creates a cvstatic.fileset file and some new files named cvdb*.dat, cvdb*.key, vista.taf, and cvdb.dbd in the current directory. In parser mode, "Force Scan" rebuilds the database. "Rescan" looks at the time stamps of files in the database and rebuilds pieces only when they are out of date.

For more information on creating a database in parser mode, see "Tutorial 3: Using the Compiler to Create a Static Analysis Database."

Parser Mode Shortcuts

If you want to use parser mode but wish to avoid waiting for the process to finish, there are two ways to speed things up:

  • You can use the compiler with the -nocode flag to skip creating object files.

  • You can build the Static Analyzer database using the compiler and bring up the graphic user interface later to read this database.

Size Limitations

The limitations and shortcomings mentioned here are largely a consequence of the grep-like model supported by scanner mode. Still, cvstatic does provide a more powerful way to approach understanding a set of source files than using grep.

When you use the Fileset Editor to add entire directories of files, you cannot enter more than 10,000 files. This limit exists to prevent someone from inadvertently starting at the root of a file system and trying to add all files. Note that there is no limitation on the number of files that can be added to the fileset when the fileset file is constructed in other ways, such as compiling source files with the -sa flag, or emitting a fileset from a Makefile rule.

cvstatic displays at most 20,000 lines of unfiltered results from a query in the Text View. Larger results can, however, be saved to a file or reduced to a more manageable size using the Results Filter.

cvstatic displays no more than 5,000 functions in the Call Tree View, 10,000 files in the File Dependency View, or 10,000 classes in the Class Tree View. These are absolute maximum limits, and the actual limits may be much lower depending on characteristics of the graph being displayed. In particular, all graph views in cvstatic are displayed in a scrolled X window, which is sized to accommodate the graph. X imposes a maximum size on windows that graphs cannot exceed. To get around this limitation, you can

  • use more specific queries to focus on the part of the program that is of the most interest

  • reduce the scale used to view the graph

  • use the Results Filter to prune the results of queries

  • use the Incremental Mode setting in graph views or the pop-up menus on nodes of the graph to follow a specific path through a large tree.

Rescanning the Fileset

After you have generated a database, you can always go back and rescan the fileset. The Admin menu provides two selections for this purpose:

"Rescan"  


asks the Static Analyzer to check for new or modified files since the last scan and to store any cross-references found in new and modified files in the database. Use this command anytime you've modified source code files during a Static Analyzer session and you want to ensure that the Static Analyzer reflects those changes in the cross-reference files.

"Force Scan"  


asks the Static Analyzer to completely rebuild the cross-reference files, creating a cross-reference database of all files specified in the fileset, whether or not they've been modified since the last scan. "Force Scan" also returns the Static Analyzer to its initial startup state with no query results in the main window and no past queries stored in the History menu. Use this command to restart the Static Analyzer and to verify the integrity of its cross-reference files.

There are also two command-line options involved with rescanning the fileset:

-batch 


asks the Static Analyzer to perform the equivalent of the "Rescan" selection; it updates the cross-reference files to accommodate new and modified files in the fileset. It doesn't open the Static Analyzer's main window, however, and it quits the Static Analyzer once the scan is finished. You can use the -batch option to update cross-reference files for a large set of source code files, using the Static Analyzer as a background process. Note that you must have a fileset in the directory where you start the Static Analyzer or that you must specify a fileset when you start the Static Analyzer, or this option won't work.

-noindex  


asks the Static Analyzer not to create an inverted index for the cross-reference database, so it doesn't create the .index and .posting files. This makes creating a cross-reference database faster than it would be without the option, but the lack of an index makes queries to the database much slower. Use this option with caution.


Note: This works in scanner mode only.

Search Path for Included Files

Whenever the Static Analyzer scans a fileset and finds an included file in source code, it searches by default for the file in the current directory and then in /usr/include. If it doesn't find the included file in either of these directories, it posts a Not Found dialog box that shows the names of those included files listed but not found in its search path.

To add directories to the search path for included files, choose "Set Include Path and Flags" from the Admin menu to open the Scanning Options dialog box shown in Figure 3-2.

The Include Directories list at the top of the box lists all directories that the Static Analyzer searches in addition to the default search path. To add a directory to the list, move the pointer to the Directory field below the list, type in a directory name, then press <Enter> (or click the Add Directory button). The path should be relative to the directory in which cvstatic is running. To delete a directory, click its name in the Include Directories list (this puts it in the Directories field), then click the Remove Directory button. You can also add flags such as -I for including files, -D for defining macros, or -U for undefining macros, as described in "Defining Symbols in the Fileset".

Figure 3-2. The Scanning Options Dialog Box

Figure 3-2 The Scanning Options Dialog Box

To exclude /usr/include from the Static Analyzer's search path, click the No Standard Includes button to turn on the option. Turn on this option whenever you don't want to scan standard libraries and headers into a .xref file. By eliminating these files from a scan, you can greatly reduce the amount of data the Static Analyzer handles, increase its speed, and concentrate query results on your custom code. (Note, however, that you won't be able to find data in the header files normally found in /usr/include.)

To close the Scanning Options dialog box, click the Close button. Note that any directories you added to the search path are stored as part of the fileset. You won't see the directories listed if you open the Fileset Editor, but you will see them if you examine the fileset file directly; each added search directory appears in a separate line with a -I prefix.

Changing to a New Fileset and Working Directory

The Static Analyzer uses only one fileset at a time, and resolves each relative pathname and general line from its current working directory. To change to a new fileset or a new working directory, use the Fileset Selection Browser window shown in Figure 3-3 by choosing "Change Fileset..." from the Admin menu.

Figure 3-3. The Fileset Selection Browser Window

Figure 3-3 The Fileset Selection Browser Window

To load a new fileset, change to the directory in which it's located using the File Selection field (either by dragging a folder icon into it or by typing directly), then select the fileset in the Files list. Once you change to a new fileset, the directory where it's located becomes the new working directory.

You can use the File Selection field of the Fileset Selection Browser window to create a new fileset from within the Static Analyzer. If you enter a new filename such as custom.fileset in the File Selection field (as part of a full pathname) and then click OK to accept your new fileset, the Static Analyzer creates a file by that name and saves any fileset edits you make to that file.