Chapter 6. Static Analyzer: Working on Large Programming Projects

The Static Analyzer is a flexible tool. It works on uncompilable code, analyzes filesets containing files from completely different programs, and presents query results in a graphic form that's easy to browse through. This same flexibility can bring unproductive results, however, if you use the Static Analyzer carelessly on hundreds of thousands (or millions) of lines of code that are typical of a large programming project. You must narrow your analysis to a meaningful portion of your project, or you may commit the Static Analyzer to spend hours to return results so extensive that they have little meaning.

This chapter recommends techniques to help you get the best results when using the Static Analyzer for large programming projects. It covers these topics:

Creating a Fileset Using a Shell Script

Creating a fileset for a large programming project can be difficult to do by hand because the source code files may be scattered throughout many different directories. If so, you can use a shell script to create a fileset for you.

A Fileset Shell Script

The shell script in Example 6-1 is an example that you can modify that searches through a list of directories for filenames with extensions that indicate source code files

Example 6-1. Script for Creating Filesets


rm -f cvstatic.fileset
DIRS="/usr/local/src /usr/src "
EXTENSIONS="*.c++ *.c *.f"
for DIR in $DIRS
   for EXT in $EXTENSIONS
         do
         find ${DIRS} -name "$EXT" -print >> cvstatic.fileset
   done
done

The first line removes the old fileset. The DIRS second line sets the search pattern; it assigns a list of directories you want searched to the variable DIRS. Put the pathname of any directory you want searched in between the quotes following DIRS, and put a space between pathnames.

The third line creates a list of the file extensions for which you want to search. Use shell metacharacters to create list entries. In this example, the script looks for any filenames that end in .c++, .c, or .f. To create an extension list that looks for different extensions, use shell metacharacters to spell out the extensions you want, and put the entries between the two quotes following EXTENSIONS. Be sure to put a space between each entry.

The six-line nested loop at the end of the script looks through each directory in the DIRS search path and returns any files that match the list of file extensions in EXTENSIONS. It puts the names of all returned files into the file cvstatic.fileset in a form that the Static Analyzer reads as a fileset.

Once you create a fileset with a shell script, you should look at the fileset before you make any queries. If you find libraries included in the fileset, you may want to remove them so that you don't have to analyze the internal workings of each library function. You may also want to remove all files that don't apply to your specific area of the project.

Customizing the Fileset for Individual Code Modules

Most programming projects are organized so that the source code is organized in modules, with individual programmers taking responsibility for different sets of modules. The Static Analyzer allows you to analyze each module separately, even if the module won't compile without other parts of the system, so it's wise to bring the Static Analyzer to bear only on the modules you're working on. You can then see your own code in detail and see calls into other modules without having to view the contents of those modules. You also reduce the size of the cross-reference database with which you work, which speeds up the time the Static Analyzer takes to refresh the database and to complete queries of the database.

Using the Results Filter to Focus Queries

Once you create a reduced fileset, you can further improve the efficiency of your analysis by setting the Static Analyzer's Results Filter (shown in Figure 6-1).

Two settings are particularly useful for large programming projects: "Headers" and "External Functions." If you set "Headers" to Exclude, you prevent the Static Analyzer from taking the time to display query results that come from header files. And if you set "External Functions" to Exclude, you ensure that the Static Analyzer doesn't display query results from libraries and other non-fileset files.

For example, consider the function foo(), which calls bar(), a function in the fileset. It also calls XtCreateWidget(), a library function that isn't in the fileset. If you set "External Functions" to Exclude and then make the query "Who Is Called By foo?", the Static Analyzer will display only bar().

Figure 6-1. Results Filter

Figure 6-1 Results Filter

Although the Results Filter doesn't reduce the time the Static Analyzer takes to make a query, it does reduce the time it takes to display the results, a substantial gain if you're using a tree view to display the results of comprehensive queries.

Applying Group Analysis Techniques

Although it's good practice for individual programmers to limit the amount of source code they analyze with the Static Analyzer to just the modules for which they're responsible, sometimes it's useful to analyze all the files in the programming project. For example, library programmers may want to know every function that calls a specific library function; that way, they know what software is affected by changes they make to the library function.

For this and similar cases, you should create a comprehensive cross-reference database on a project workstation as shown in Figure 6-2. This arrangement allows users on personal workstations to query the extensive project database without actually creating the database.

Figure 6-2. A Project Cross-reference Database

Figure 6-2 A Project Cross-reference Database

Setting Up a Project Database

To create a project cross-reference database, you first need a comprehensive fileset for the programming project. To maintain consistency, the programmer in charge of checking in files for builds should make and maintain the fileset. If the source tree uses a consistent set of directories, the build programmer can use a shell script like the example earlier in this chapter to update the fileset automatically.

Once the fileset is up to date, the build programmer creates a cross-reference database. Because it can take a long time to create a cross-reference database for a large programming project, you can save time by using the –batch command-line option when you start the Static Analyzer. It runs the Static Analyzer in the background, keeps the Static Analyzer window from opening, and reduces the time necessary to create a cross-reference database.

It may be useful to run the Static Analyzer in batch mode on the server once a night. This provides a fresh database for programmers who wish to query it from their own workstations. To protect the shared database from automatic modification by outside users, be sure that read and write permissions for all four Static Analyzer files on the server—cvstatic.fileset, cvstatic.xref, cvstatic.index, and cvstatic.posting—deny write access to outside users.

Querying a Project Database

To query a project database from a personal workstation, you must first mount the project database in a local directory using NFS (the Network File System). You then start the Static Analyzer using command line options to specify the project fileset and to set the Static Analyzer to read only so that it won't try to modify the project database. For example, this command starts the Static Analyzer, sets it to read-only, and directs it to the project fileset, which is NFS-mounted in the directory /project:

cvstatic -readonly -fileset /project/cvstatic.fileset

The first command-line option, -readonly, sets the Static Analyzer so that it won't try to rebuild the project database at any time. The second command-line option, -fileset, sets the fileset to cvstatic.fileset, which is NFS-mounted in the directory /project.

When you make queries on a large project database, use caution and common sense. Comprehensive queries such as "List All Functions" won't yield useful results—few people find it truly useful to see every function in millions of lines of code displayed at one time. Comprehensive queries like this may also take a good deal of time to complete. You'll find it much more productive to take a task-oriented approach when querying. Ask what you really need to know in the project, then make the most specific query that answers your questions. For example, if you get a bug report on a function, you might use specific queries such as "Where Defined", "Who Calls", or "Who Is Called By" to get the information you need about that function.

Viewing Suggestions

If you do need to make comprehensive queries on a large database, consider using Text View for your queries. Because Text View doesn't require the Static Analyzer to build a tree containing thousands of elements, it's much faster at displaying the results of a comprehensive query than any of the tree views.

Text View doesn't show connections between calling and called functions in the query results area, but you can easily follow a chain of functions. First, click a function name you want, then press <Alt-B> to see which functions it calls, or press <Alt-C> to see which functions call it.

Tree views show relationships between query elements more clearly than text view, so you may want to use tree views to display the results of some queries. If so, you can reduce the time the Static Analyzer needs to display tree view results by observing a few limitations.

First, don't use either the "All Defined" or the "Complete Tree" view options, which display a huge set of elements in the query results area no matter how limited a query you make. Use the "Query Only" and "Incremental Mode" view options to restrict the number of elements displayed for each query.

In Incremental Mode, you can build a tree from scratch by making very specific queries that identify and then follow only the branch of the tree in which you're interested. For example, you may want to follow a chain of function calls starting with main(). If so, start with the query "Who Is Called By main?". Find a function among those called that you want to follow, then query the Static Analyzer for the functions called by that function. As you continue through the call chain, the Static Analyzer displays only the branch of the call tree that applies, not the entire tree.

You should also consider viewing query results in a tree view that offers coarser resolution than you normally use. For example, File Dependency View displays file elements, each of which may contain many functions. This is a much coarser view of the database than that offered by Call Tree View, which displays functions individually in function elements. If you make a query such as "Who Calls", while in File Dependency View, the Static Analyzer shows you each file that contains called functions. You can then open the Source View window for one of those files; it highlights each called function in its display area. The same query in Call Tree View would show you each called function in tree form, but would probably require many more elements to show query results and would take much longer to return results.