-
Notifications
You must be signed in to change notification settings - Fork 23
How to write your own checkers
This is a translation done by ChatGPT.
Note: In this document, //
is used to denote the root directory of the repository.
The process of creating a custom rule set is as follows:
- Create the basic files for the rule set and make related changes.
- Write custom rule checkers.
- Integrate into the container image.
-
In the root directory of the repository, create a new folder named after the rule set (e.g.,
//toy_rules
) and add a.gitignore
file (refer to//toy_rules/.gitignore
). -
Add a
Run
function for this rule set (refer to//toy_rules/analyzer/run.go
) and call it in//misra/analyzer/cmd/main.go
:func selectRun(rulePrefix string) (runFuncType, error) { switch rulePrefix { ... case "toy_rules": return toy_rules.Run, nil ... } }
-
Add the newly created rule set and its language (C/C++/both) to
ruleSets
in//misra/analyzer/cmd/main.go
. This allows the image to find the new rule set. -
If testing with
go test
, add the new rule set incheckingStandards
in//cruleslib/testlib/testlib.go
to allowtestlib
to find the new rule set.
Take //toy_rules/rule_1
as an example, its folder structure is similar to:
rule_1
├── _bad0001
│ ├── bad.cc
│ ├── expected.textproto
│ └── Makefile
├── _good0001
├── libtooling
├── rule_1_test.go
└── rule_1.go
In this structure, _bad0001
and _good0001
are folders for test cases, where _bad0001
contains non-compliant test cases and _good0001
contains compliant ones. expected.textproto
specifies the expected test results, including the locations and content of the errors, written by the developer. The format is like:
results {
path: "bad.cc"
line_number: 6
error_message: "NULL should not be used as an integer value"
}
During testing, if the actual result matches the expected result, the test passes. Note that even compliant test data needs an empty expected.textproto
.
The libtooling
folder contains the libtooling implementation for the rule. If there is no libtooling implementation, this folder will not exist.
rule_1.go
contains the logic for calling the checker binary. It calls different tool runners from //cruleslib/runner/runner.go
, such as RunLibtooling
, RunCppcheck
, etc., to specify the checker to be run.
rule_1_test.go
contains the logic for go test
testing, like:
func TestBad0001(t *testing.T) {
tc := testcase.NewWithSystemHeader(t, "_bad0001")
tc.ExpectOK(testlib.ToTestResult(Analyze(tc.Srcdir, tc.Options)))
}
ExpectOK
means that the result of the Analyze
function matches the content in expected.textproto
, while ExpectFailure
indicates a mismatch. NewWithSystemHeader
is used to add some system library paths when creating a test case.
- Add
toy_rules_deps
andbigmain_toy_rules
in//podman_image/bigmain/BUILD
:
cc_library(
name = "toy_rules_deps",
deps = [
"//toy_rules/rule_1/libtooling:rule_1_lib",
# Additional custom rules can be added here
],
)
cc_binary(
name = "bigmain_toy_rules",
srcs = ["main.cc"],
deps = [
":rule",
":toy_rules_deps",
"//libtooling_includes:cmd_options",
"@com_github_google_glog//:glog",
"@com_google_absl//absl/strings:str_format",
],
)
- Add in
//podman_image/bigmain_symlink
:
mkdir /opt/naivesystems/toy_rules
ln -s /opt/naivesystems/bigmain /opt/naivesystems/toy_rules/rule_1
- Create a new file
//podman_image/Containerfile.toyrules
. If the new rule set only checks C language, create an image based onmisrac
. If it checks C++, create it based onmisracpp
, then copybigmain_toy_rules
into the image. If a Chinese image is needed, usedev
instead ofdev_en
forbase_tag
.
ARG base_tag=dev
FROM naive.systems/analyzer/misracpp:${base_tag}
COPY "bigmain_toy_rules" "/opt/naivesystems/bigmain"
- Finally, add
bigmain_toy_rules
andbuild-toy-rules-en
targets in//podman_image/Makefile
to generate the image.
After these steps, run make build-toy-rules-en
in //podman_image
to obtain a container image containing the new rule set. In the .naivesystems/check_rules file of the project to be tested, adding toy_rules/rule_1 allows you to execute the following command for static code analysis using the generated image:
podman run -v $PWD:/src:O -v $PWD/.naivesystems:/config:Z \
-v $PWD/output:/output:Z -w /src \
naive.systems/analyzer/toyrules:dev_en \
/opt/naivesystems/misra_analyzer -show_results -alsologtostderr
A checker requires the careful selection and implementation of a suitable tool, with the processing of its outcomes to produce a compliant resultsList
.
NaiveSystems Analyze is used to check if there are any violations in the project code, including but not limited to resource leaks, memory overflows, stack address escapes, etc. For example, the following code
long l = 100000;
int8_t i = 0;
i = l;
This code may have a problem with precision loss. Another example, for the following code
int i = 8 / 0;
This code has a division by zero error.
We generally divide the problems to be checked into two categories. One is STU (single translation unit), which are errors that can be detected within a single translation unit, like the precision loss mentioned above. The other is CTU (cross translation unit), which are errors that need to be checked across multiple translation units. For example, for a division by zero error, there might be a case like this:
// test.h
int getDiv(int a, int b);
// test.cc
int getDiv(int a, int b) {
return a / b;
}
// main.cc
#include "test.h"
int main() {
getDiv(10, 0);
return 0;
}
This requires both test.cc
and main.cc
translation units to identify the error.
From another perspective, we can also divide problems into two types: those that can be directly identified on the AST and those that require deep analysis to resolve. For example, for the precision loss issue:
void test(void)
{
long l = 100000;
int i = 1;
i = l;
}
We use Clang to dump out the AST
$ clang -Xclang -ast-dump -fsyntax-only test.c
`-FunctionDecl 0x1208e23e8 <test.c:1:1, line:6:1> line:1:6 test 'void (void)'
`-CompoundStmt 0x1208e26e8 <line:2:1, line:6:1>
|-DeclStmt 0x1208e2588 <line:3:5, col:20>
| `-VarDecl 0x1208e24e8 <col:5, col:14> col:10 used l 'long' cinit
| `-ImplicitCastExpr 0x1208e2570 <col:14> 'long' <IntegralCast>
| `-IntegerLiteral 0x1208e2550 <col:14> 'int' 100000
|-DeclStmt 0x1208e2640 <line:4:5, col:14>
| `-VarDecl 0x1208e25b8 <col:5, col:13> col:9 used i 'int' cinit
| `-IntegerLiteral 0x1208e2620 <col:13> 'int' 1
`-BinaryOperator 0x1208e26c8 <line:5:5, col:9> 'int' '='
|-DeclRefExpr 0x1208e2658 <col:5> 'int' lvalue Var 0x1208e25b8 'i' 'int'
`-ImplicitCastExpr 0x1208e26b0 <col:9> 'int' <IntegralCast>
`-ImplicitCastExpr 0x1208e2698 <col:9> 'long' <LValueToRValue>
`-DeclRefExpr 0x1208e2678 <col:9> 'long' lvalue Var 0x1208e24e8 'l' 'long'
From the AST, we can see that there are some ImplicitCastExpr
from long
to int
, indicating a code violation. The main characteristic of this type of problem is that code violations can be directly observed from the code structure.
However, for zero-division problems, the divisor could be a result of complex calculation, like:
int d = 5;
int i = 10 / (d - d);
Here, we need to simply calculate the result of d-d
to detect the error. For more complex cases, this calculation can be very intricate or might be a result obtained from another place, requiring deep analysis to resolve the issue.
For problems that can be directly concluded from the AST, we generally use tools like libtooling, cppcheck, ClangSema, ClangTidy, etc. For problems that require deep analysis to solve, we generally use CSA and Infer.
libtooling is an official tool from Clang, which can be understood as a kind of Clang plugin. Users write AST Matchers to match the desired AST nodes, and then a predefined callback (Callback) function is automatically called for processing.
For instance, for MISRA C++:2008 4.10.1 NULL must not be used as an integer value, the code we write is as follows:
class Callback : public MatchFinder::MatchCallback {
public:
void Init(ResultsList* results_list, MatchFinder* finder) {
results_list_ = results_list;
finder->addMatcher(
implicitCastExpr(hasSourceExpression(expr(gnuNullExpr())),
hasImplicitDestinationType(isInteger()),
unless(isExpansionInSystemHeader()))
.bind("cast"),
this);
}
void run(const MatchFinder::MatchResult& result) override {
const Expr* expr = result.Nodes.getNodeAs<Expr>("cast");
string error_message = "NULL must not be used as an integer value";
string path = GetFilename(expr, result.SourceManager);
int line = GetLine(expr, result.SourceManager);
AddResultToResultsList(results_list_, path, line, error_message);
}
private:
ResultsList* results_list_;
};
This code matches all implicitCastExpr
from gnuNullExpr
's sourceExpression
to isInteger
's implicitDestinationType
, that are not isExpansionInSystemHeader
. After a match is found, the callback function run
is automatically called to report the error.
It is an STU type problem, so it is called using checker_integration.Libtooling_STU
to specify the type:
runner.RunLibtooling(srcdir, "misra_cpp_2008/rule_4_10_1", checker_integration.Libtooling_STU, opts)
Full code reference at //misra_cpp_2008/rule_4_10_1
.
For more details on how to write a libtooling checker, please refer to the official libtooling documentation and related tutorials.
About the structure of the libtooling
folder, taking //toy_rules/rule_1
as an example:
rule_1
├── libtooling
│ ├── BUILD
│ ├── checker.cc
│ ├── checker.h
│ ├── lib.h
│ ├── main.cc
│ └── rule_1.cc
-
checker.h
andchecker.cc
contain the specific implementation of the checker. -
BUILD
contains the Bazel definition. -
main.cc
includes the logic for calling the libtooling checker, parsing arguments, and writing the checker analysis results to a specified file.
Apart from the specific implementation logic of the checker, other parts are generally similar across all implementations. During development, you can run bazel build rule_1
in this folder to generate a callable binary.
cppcheck is an open-source tool that has been checked into our codebase: //third_party/cppcheck
. After we compile, a binary named cppcheck
is generated in this folder.
When we analyze code, we first use this binary to generate a dumpfile, which is a file information structure similar to the Clang AST. Finally, we use a script to analyze this dumpfile. For example, suppose we have code
#include <clocale> // Non-compliant
int main() {
return 0;
}
First, generate a dumpfile
~/analyze/third_party/cppcheck/cppcheck --dump main.cpp
Then use the command
~/analyze/third_party/cppcheck/cppcheck --abspath --dump --std=c99 --dump-file=main.cpp.c99.dump main.cpp
~/analyze/third_party/cppcheck/addons/misra.py --check_rules=misra_c_2012/rule_15_1 --output_dir output bad1.cpp.c99.dump
To get the analysis results. These commands are generally used for development, but actual calls are wrapped in the runner:
runner.RunCppcheck(srcdir, "misra_c_2012/rule_15_1", checker_integration.Cppcheck_STU, opts)
It should be noted that cppcheck cannot generate a dumpfile for files with only directives. When writing test cases, an additional main
function needs to be added.
Our cppcheck implementation is all in //third_party/cppcheck/addons/misra.py
, and you can search for the name of the relevant rule (such as rule_15_1
) to view the content of the corresponding checker.
Using //toy_rules/rule_2
as an example, some calling logic needs to be added in //third_party/cppcheck/addons/misra.py
.
If toy_rules is a rule set that has never appeared in misra.py before, you need to:
- Create a new function
executeToyRuleCheck
to execute the checker.
def executeToyRuleCheck(self, check_function, *args):
check_function(*args)
- Create a new class
ToyRuleResult
to encapsulate the results of the checker.
class ToyRuleResult:
def __init__(self, path, line_num, err_msg, other_locations = None):
self.path = path
self.line_number = line_num
self.error_message = f'{err_msg}'
self.locations = [ErrorLocation(path, line_num)]
if other_locations is not None:
for loc in other_locations:
self.locations.append(ErrorLocation(loc.file, loc.linenr))
- Create a new function
reportToyRuleError
to add results to the JSON list and output results to stdout/stderr. The specific format oferror_message
can be defined according to your needs; here, only anerror_id
is passed.
def reportToyRuleError(self, location, rule_num, other_locations = None):
if self.settings.verify:
self.verify_actual.append('%s:%d %d.%d.%d' % (location.file, location.linenr, rule_num))
else:
error_id = f"Rule-{rule_num}"
toyrule_severity = 'Required'
this_violation = '{}-{}-{}-{}'.format(location.file, location.linenr, location.column, rule_num)
# If this is new violation then record it and show it. If not then
# skip it since it has already been displayed.
if not this_violation in self.existing_violations:
self.existing_violations.add(this_violation)
self.current_json_list.append(ToyRuleResult(location.file, location.linenr, error_id, other_locations))
cppcheckdata.reportError(location, toyrule_severity, "", "toy", error_id)
if toyrule_severity not in self.violations:
self.violations[toyrule_severity] = []
self.violations[toyrule_severity].append('toy' + "-" + error_id)
The JSON list of results will be located in //toy_rules/rule_2/_bad0001/output/tmp/test_run/test_run-*/cppcheck_out.json
and will look something like:
[
{
"path": "/home/username/analyze/toy_rules/rule_2/_bad0001/bad.c",
"line_number": 3,
"error_message": "Rule-2",
"locations": [
{
"path": "/home/username/analyze/toy_rules/rule_2/_bad0001/bad.c",
"line_number": 3
}
]
}
]
If the rule set already exists, add the implementation of a checker for a specific rule in the rule set:
- Add the implementation function
toy_rule_2
(reference MISRA C:2012 15.1 - The goto statement should not be used).
def toy_rule_2(self, data):
for token in data.tokenlist:
if token.str == "goto":
self.reportToyRuleError(token, 2)
- Add the function
toy_rule_2
to theparseDump
list. Pass the required parameters, such ascfg
,data.rawTokens
, ordumpfile
, based on the rule's requirements.
if "toy_rules/rule_2" in rules_list or check_rules == "all":
self.executeToyRuleCheck(self.toy_rule_2, cfg)
Afterward, the runner
can be used to call the corresponding cppcheck implementation for toy_rules/rule_2
and check its output:
runner.RunCppcheck(srcdir, "toy_rules/rule_2", checker_integration.Cppcheck_STU, opts)
ClangSema is a tool we use to check certain issues using Clang diagnostic flags. Typically, if the Clang plugin in VSCode indicates a problem in the code or if a warning is reported during compilation, for instance, "AUTOSAR A-5-3-3 Pointers to incomplete class types shall not be deleted," a warning like this will be generated during compilation:
bad1.cpp:6:13: warning: deleting pointer to incomplete type 'C::Impl' may cause undefined behavior [-Wdelete-incomplete]
delete pimpl;
^ ~~~~~
We can directly use the -Wdelete-incomplete
diagnostic flag to invoke ClangSema in Analyze, and check if the returned error contains related keywords:
results, err := runner.RunClangSema(srcdir, "-Wdelete-incomplete", opts)
The complete example is at //autosar/rule_A5_3_3
.
The list of issues that can be checked by ClangSema is at DiagnosticsReference.
ClangTidy is another tool provided by Clang, implemented internally using libtooling. It essentially has some libtooling checkers pre-written, which we can use directly:
runner.RunClangTidy(srcdir, args, opts)
Here, args is a string list that can be used to check the same rule with multiple checkers, like in the example at //autosar/rule_A7_5_2
.
The list of issues that can be checked by ClangTidy is available at https://clang.llvm.org/extra/clang-tidy/checks/list.html.
Please read the CSA (Clang Static Analyzer) official documentation first. In summary, CSA is a path-sensitive interprocedural analysis tool based on symbolic execution technology, used to solve problems dependent on state, such as the aforementioned division by zero issue.
We have also checked CSA into our code repository. You can find the currently implemented checkers in //third_party/llvm-project/clang/lib/StaticAnalyzer/Checkers
, some of which are native to Clang, while others with rule set names are our own implementation.
For example, in the DivZeroChecker
, checkPreStmt
is a callback function that CSA automatically calls every time it encounters a new Stmt
. Here, we check if BinaryOperator
is a division operation, and if so, get its rhs, the divisor. We use ConstraintManager
to determine if there's a possibility of rhs being zero. ConstraintManager
and CheckerContext
store the state of variables and symbols, and report an error if there's a possibility of zero.
After implementing a specific checker, it can be invoked with -analyzer-checker
, and multiple CSA checkers can be called simultaneously, separated by commas.
runner.RunCSA(srcdir, "-analyzer-checker=core.DivideZero", opts)
Relevant examples are in //autosar/rule_A5_6_1
and //autosar/rule_A0_4_4
.
Infer is a tool similar to CSA, and the list of issues it can check is at all-issue-types.
We invoke it using the corresponding issue type:
runner.RunInfer(srcdir, "--liveness", opts)
Relevant examples are in //misra_c_2012_crules/rule_2_2
.
Some issues can be resolved simply by processing the files as strings, such as //autosar/rule_A13_6_1
.
We have also integrated other tools, like analyzing based on errors or warnings from Clang (RunClangForErrorsOrWarnings
), GCC (RunGCC
), Cpplint (RunCpplint
). For more details, refer to //cruleslib/runner/runner.go
.
If there's only one checker, the resultsList
generated by the runner typically meets the requirements. If multiple checkers are needed to check the same rule, simply combine the results from all checkers in the end. Related examples can be found in //misra_c_2012_crules/rule_2_2
.