Last Update:
Writing An Open-Source C++ Static Analysis Tool
Table of Contents
While there are many code analysis tools for C++, why not write it from scratch? This article will introduce you to an open-source C++ static analysis tool that you might find useful or at least interesting.
This is a guest post from Greg Utas.
Greg was chief software architect of the call servers used in AT&T’s wireless network. He is the author of Robust Communications Software and the developer of the Robust Services Core (RSC). You can find Greg @Linkedin and @Github."
Background
The tool described in this article is built on RSC, an open-source framework for resilient C++ applications. This allows the tool to use RSC’s CLI, logging, and debugging capabilities.
The tool came about because, after I had been developing RSC for a while, I decided to tidy its #include
directives, to remove headers that weren’t needed, and to include those that were only being picked up transitively but accidentally. Surely there was a tool that would do this.
Wrong. This was around 2013, and I did find a Google initiative called Include What You Use. But it had been mothballed, though it was later resurrected. But because I could find no such tool at the time, I decided to develop one.
It quickly became clear that the tool needed to parse C++. But even that wouldn’t be enough. It needed to do many of the same things as a compiler, like name resolution.
Instead of deciding that the exercise was too overwhelming, I forged ahead. For as long as it lasted, it would be a learning experience. And it would provide a diversion when I didn’t feel like working on the main purpose of RSC, which is to provide a framework for resilient C++ applications, especially servers.
The tool grew organically, and its code was continually refactored. The parser was implemented using recursive descent, which results in code that is easy to understand and modify. The objects that the parser created to represent C++ items were added to their scope by a virtual EnterScope
function. If they contained executable code, they were then “compiled” by a virtual EnterBlock
function. To verify that the code had been properly understood, the tool could be told to emit pseudo-code for a stack machine.
After a while, it became possible to analyze #include
directives and recommend additions and deletions. But why stop there? Because the tool knew most of what a compiler knows, it would also be easy to make recommendations about forward declarations and using
statements. And to suggest deleting things that were unused. And, as the tool evolved, to highlight violations of all sorts of best practices, effectively acting as an automated Scott Meyers code inspector.
Although the tool generated many of the same warnings as commercially available tools, fixing them manually was tedious. So why not do it automatically? This wouldn’t be feasible for every warning, but it would be for many of them. The user would specify which warnings to fix, and the tool would then modify the code accordingly. Implementing this made the tool far more effective.
The tool also ended up doing other things, including
- displaying all of the compiled code in a canonical form,
- generating a global cross-reference, and
- analyzing code dependencies as an aid for restructuring.
But its principal purpose is still to clean up code, so let’s look at the typical workflow.
Workflow
First, the code to be analyzed must be imported:
>read buildlib
The >
is RSC’s CLI prompt. The read
command is told to read a script called buildlib
, which imports the project’s code from a list of its directories.
Next, the code has to be compiled:
>parse - win64 $files
, where
parse
is the command-
indicates that no compiler options are requiredwin64
is the target (others arewin32
andlinux
)$files
is a built-in variable that contains all the code files
The tool now calculates a global compile order and compiles all of the code together. As each file is compiled, its name is displayed. When a template is instantiated, its name and template arguments are also displayed. RSC currently contains about 235K lines of code. Compiling it on my laptop takes 2 minutes, about the same as an MSVC compile under VS2022.
Now for a code inspection:
>check rsc $files
, where
check
is the commandrsc
is the filename (which will be rsc.check.txt)$files
is, again, all of the files
The resulting file lists all of the code warnings that the tool found. There are currently 148 different types of warnings, and the tool can fix 101 of them. For example:
fix 17 f $files
, where
fix
is the command17
is warning W017: “Add#include
directive”f
isfalse
, meaning don’t prompt before fixing each occurrence of the warning$files
is, again, all of the files
The tool now edits all of the code files by inserting all of the #include
directives that it recommended.
Two Examples
In CxxArea.h and CxxArea.cpp, change the first parameter to Class::CanConstructFrom
from const StackArg& that
to StackArg that
:
bool CanConstructFrom(const StackArg& that, const string& thatType) const;
bool Class::CanConstructFrom(const StackArg& that, const string& thatType) const
{
// code
}
After recompiling (for real), launch RSC and check the code:
>read buildlib
>parse - win64 $files
>check rsc $files
The file rsc.check.txt (written to the directory …/rsc/excluded/output) now contains a new warning:
W087 Object could be passed by const reference
ct/CxxArea.h(418/1): (StackArg that, const std::string& thatType) const;
W087
is the warning number, 418
is the line number, and the /1
indicates that the warning is for the first parameter. Let’s fix it:
ct>fix 87 f cxxarea.h
Checking diffs after fixing code is recommended.
The following is also automatic in modified files:
o Whitespace at the end of a line is deleted.
o A repeated blank line is deleted.
o If absent, an endline is appended to the file.
CxxArea.h:
Line 418/1: Object could be passed by const reference
(StackArg that, const std::string& thatType) const;
CxxArea.cpp:
bool Class::CanConstructFrom(const StackArg& that, const string& thatType) const
CxxArea.h:
(const StackArg& that, const std::string& thatType) const;
End of warnings.
...CxxArea.h committed
...CxxArea.cpp committed
2 file(s) were changed
The original signature of the function’s declaration and definition has now been restored.
Warning W020
is “Using statement in header”. rsc.check.txt contains many of these because I don’t always fix them when the header in question is only used within its own namespace. But let’s fix the one for CodeWarning.h:
ct>fix 20 f codewarning.h
Checking diffs after fixing code is recommended.
The following is also automatic in modified files:
o Whitespace at the end of a line is deleted.
o A repeated blank line is deleted.
o If absent, an endline is appended to the file.
CodeWarning.h:
Line 38: Using statement in header
using NodeBase::word;
OK.
End of warnings.
...CodeWarning.h committed
1 file(s) were changed.
If you now do a diff on CodeWarning.h, you will see that the using declaration for NodeBase::word
has been erased and that two occurrences of word
have been qualified by NodeBase::
. Another occurrence of word
was already qualified, so it was left unchanged.
Limitations
Can you use the tool? Yes, but there are a couple of limitations.
First, the tool only supports the C++ language features that RSC uses, which is a subset of C++11. To be honest, there hasn’t been much since then that I find compelling. Some of it is arguably more elegant, but so far I’ve been able to do everything I need with the subset of the language that the tool supports.
Second, anything from the STL and other external libraries has to be declared in parallel headers that are imported with the rest of the code. These headers only need to provide declarations, not definitions. This approach avoids having to import a variety of external directories, correctly navigate their #ifdefs
, compile many things that the project doesn’t use, and support language features that only the external libraries need.
How serious are these limitations? It depends on your code. In some cases, it’s easy to change code so that the tool can understand it. In other cases, I’ve had to evolve the tool to support a language feature that I needed to use. As far as those parallel headers go, you only have to extend what is already declared to support RSC, which is a subset of the STL, as well as a smattering of Windows and Linux headers.
Advantages
I’ve used several static analysis tools, including Coverity, PVS-Studio, and clang-tidy. All of them are useful and have areas where they excel. The major advantage of this tool, besides being open source, is that it can actually fix problems instead of just complaining about them. clang-tidy can also do this to some extent, but I haven’t evaluated it. I use VS2022 with CMake, and it isn’t clear how to access that clang-tidy capability from that configuration.
What to Read Next
The motivation for this article is that the tool has become more than a diversion. It would be great to find other contributors who want to improve it so that it becomes useful to a wider range of projects.
The following documentation and files will give you a better idea of the tool’s capabilities and design:
Document | Description |
---|---|
rsc.check.txt | warnings found in RSC’s code |
cppcheck.txt | help file for the 148 warnings |
C++ Static Analysis Tools | introductory documentation |
C++11 Exclusions | the subset of C++ that the tool supports |
A Static Analysis Tool for C++ | an article with more details |
Parser.cpp | C++ recursive descent parser |
RSC’s ct directory | the tool’s source code (namespace CodeTools ) |
RSC’s subs directory | parallel headers for external libraries |
I welcome your comments. RSC’s repository has a Discussions page, which would be a good venue for technical topics.
And finally, my thanks to Bartlomiej for generously offering to publish this article.
I've prepared a valuable bonus for you!
Learn all major features of recent C++ Standards on my Reference Cards!
Check it out here: