Chapter 6. Working on Large Programming Projects

The Static Analyzer works on uncompilable code, analyzes filesets containing files from completely different programs, and presents query results in a graphic form that is easy to browse. This flexibility can bring unproductive results, however, if you use the Static Analyzer carelessly on hundreds of thousands (or millions) of lines of code that are typical of a large programming project. To be effective, you must narrow your analysis to a meaningful portion of your project, or you may end up with results so extensive that they have little meaning.

This chapter recommends techniques to help you get the best results when using the Static Analyzer for large programming projects. It covers the following topics:

Creating a Fileset Using a Shell Script

Creating a fileset for a large programming project can be difficult to do by hand because the source code files may be scattered throughout many different directories. If so, you can use a shell script to create a fileset for you.

The following lines of code show a shell script that searches through a list of directories for file names with extensions that indicate source code files:

rm -f cvstatic.fileset
DIRS="/usr/local/src /usr/src "
EXTENSIONS="*.c++ *.c *.f"
for DIR in $DIRS
     for EXT in $EXTENSIONS
           do
           find ${DIRS} -name "$EXT" -print >> cvstatic.fileset
     done
done

The first line removes the old fileset. The DIRS second line sets the search pattern and assigns a list of directories you want searched. Put the pathname of any directory you want searched in between the quotes following DIRS, and put a space between pathnames.

The third line creates a list of the file extensions for which you want to search. Use shell metacharacters to create list entries. In this example, the script looks for any filenames that end in .c++, .c, or .f. To create an extension list that looks for different extensions, use shell metacharacters to spell out the extensions you want, and put the entries between the two quotes following EXTENSIONS. Be sure to put a space between each entry.

The six-line nested loop at the end of the script looks through each directory in the DIRS search path and returns any files that match the list of file extensions in EXTENSIONS. Be sure to put a space between each entry.. It puts the names of all returned files into the file cvstatic.fileset in a form that the Static Analyzer reads as a fileset.

Once you create a fileset with a shell script, you should look at the fileset before you make any queries. If you find libraries included in the fileset, you may want to remove them so that you don't have to analyze the internal workings of each library function. You may also want to remove all files that do not apply to your specific area of the project.

Customizing the Fileset for Individual Code Modules

Most programming projects are organized so that the source code is organized in modules, with individual programmers taking responsibility for different sets of modules. The Static Analyzer allows you to analyze each module separately, even if the module will not compile without other parts of the system. You can see your own code in detail and see calls into other modules without having to view the contents of those modules. You also reduce the size of the cross-reference database with which you work, which speeds up the time the Static Analyzer takes to refresh the database and to complete queries of the database.

Using the Results Filter to Focus Queries

Once you create a reduced fileset, you can further improve the efficiency of your analysis by setting the Static Analyzer's Results Filter. The Results Filter's Headers and External Functions settings are particularly useful for large programming projects.

If you set Headers to Exclude, you prevent the Static Analyzer from taking time to display query results that come from header files. And, if you set External Functions to Exclude, you ensure that the Static Analyzer does not display query results from libraries and other nonfileset files.

For example, consider the function foo(), which calls bar(), a function in the fileset. It also calls XtCreateWidget() , a library function that is not in the fileset. If you set External Functions to Exclude and then make the query Who Is Called By foo?, the Static Analyzer will display only bar().

Although the Results Filter does not reduce the time the Static Analyzer takes to make a query, it does reduce the time it takes to display the results, a substantial gain if you are using a tree view to display the results of comprehensive queries.

Applying Group Analysis Techniques

Although it is good practice for individual programmers to limit the amount of source code they analyze with the Static Analyzer to just the modules for which they are responsible, sometimes it is necessary to analyze all files in a programming project. For example, library programmers may want to know every function that calls a specific library function. That way, they know what software is affected by changes they make to the library function.

For this and similar cases, you should create a comprehensive cross-reference database on a project workstation as shown in Figure 6-1. This arrangement allows users on personal workstations to query the extensive project database without actually creating the database.

Figure 6-1. A Project Cross-Reference Database

A Project Cross-Reference Database

Setting Up a Project Database

To create a project cross-reference database, you first need a comprehensive fileset for the programming project. To maintain consistency, the programmer in charge of checking in files for builds should make and maintain the fileset. If the source tree uses a consistent set of directories, the build programmer can use a shell script like the example earlier in this chapter to update the fileset automatically.

Once the fileset is up to date, the build programmer creates a cross-reference database. Because it can take a long time to create a cross-reference database for a large programming project, you can save time by using the -batch command-line option when you start the Static Analyzer. This option runs the Static Analyzer in the background, keeps the Static Analyzer window from opening, and reduces the time necessary to create a cross-reference database.

It may be useful to run the Static Analyzer in batch mode on the server once a night. This provides a fresh database for programmers who wish to query it from their own workstations. To protect the shared database from automatic modification by outside users, be sure that read and write permissions for all four Static Analyzer files on the server (cvstatic.fileset, cvstatic.xref, cvstatic.index, and dcvstatic.posting) deny write access to outside users.

Querying a Project Database

To query a project database from a personal workstation, you must first mount the project database in a local directory using the Network File System (NFS). You then start the Static Analyzer using command line options to specify the project fileset and to set the Static Analyzer to read only so that it will not try to modify the project database. For example, the following command starts the Static Analyzer, sets it to read-only, and directs it to the project fileset, which is NFS-mounted in the directory /project:

% cvstatic -readonly -fileset /project/cvstatic.fileset

The -readonly command line option sets the Static Analyzer so that it will not try to rebuild the project database. The -fileset command line option sets the fileset to cvstatic.fileset , which is NFS-mounted in the directory /project.

When you make queries on a large project database, use caution and common sense. Comprehensive queries such as List All Functions will not yield useful results as too much code is displayed at one time. Comprehensive queries like this may also take a good deal of time to complete. It is more productive to take a task-oriented approach when querying. Ask what you really need to know in the project, then make the most specific query that answers your questions. For example, if you get a bug report on a function, you might use specific queries such as Where Defined, Who Calls, or Who Is Called By to get the information you need about that function.

Viewing Suggestions

If you need to make comprehensive queries on a large database, consider the following viewing tips:

  • Use Text View for your queries. Because Text View does not require the Static Analyzer to build a tree containing thousands of elements, it is much faster at displaying the results of a comprehensive query than any of the tree views.

    Although Text View does not show connections between calling and called functions in the query results area, you can easily follow a chain of functions. First, click the function name you want. Then press Alt-B to see which functions it calls or press Alt-C to see which functions call it.

  • Because the tree views show relationships between query elements more clearly than Text View, you may want to use tree views to display the results of some queries. If so, you can reduce the time the Static Analyzer needs to display tree view results by observing a few limitations.

    Use the Query Only and the Incremental Mode viewing options to restrict the number of elements displayed for a query.

    In Incremental Mode, you can build a tree from scratch by making very specific queries that identify and follow only the branch of the tree in which you are interested. For example, you may want to follow a chain of function calls starting with main(). If so, start with the query Who Is Called By main?. Find a function among those called that you want to follow, then query the Static Analyzer for the functions called by that function. As you continue through the call chain, the Static Analyzer displays only the branch of the call tree that applies, not the entire tree.

  • You should also consider viewing query results in a tree view that offers coarser resolution than you normally use. For example, File Dependency View displays file elements, each of which may contain many functions. This is a much coarser view of the database than that offered by Call Tree View, which displays functions individually in function elements. If you make a query such as Who Calls while in File Dependency View, the Static Analyzer shows you each file that contains called functions. You can then open the Source View window for one of those files; it highlights each called function in its display area. The same query in Call Tree View would show you each called function in tree form, but would probably require many more elements to show query results and would take much longer to return results.