Skip to content

How to write your own checkers

Mengting Chen edited this page Dec 4, 2023 · 16 revisions

This is a translation done by ChatGPT.

Note: In this document, // is used to denote the root directory of the repository.

Creating Custom Rule Sets

The process of creating a custom rule set is as follows:

  1. Create the basic files for the rule set and make related changes.
  2. Write custom rule checkers.
  3. Integrate into the container image.

Creating the Basic Files for the Rule Set and Related Changes

  1. In the root directory of the repository, create a new folder named after the rule set (e.g., //toy_rules) and add a .gitignore file (refer to //toy_rules/.gitignore).

  2. Add a Run function for this rule set (refer to //toy_rules/analyzer/run.go) and call it in //misra/analyzer/cmd/main.go:

    func selectRun(rulePrefix string) (runFuncType, error) {
        switch rulePrefix {
            ...
            case "toy_rules":
                return toy_rules.Run, nil
            ...
        }
    }
  3. Add the newly created rule set and its language (C/C++/both) to ruleSets in //misra/analyzer/cmd/main.go. This allows the image to find the new rule set.

  4. If testing with go test, add the new rule set in checkingStandards in //cruleslib/testlib/testlib.go to allow testlib to find the new rule set.

Writing Custom Rule Checkers

Take //toy_rules/rule_1 as an example, its folder structure is similar to:

rule_1
├── _bad0001
│   ├── bad.cc
│   ├── expected.textproto
│   └── Makefile
├── _good0001
├── libtooling
├── rule_1_test.go
└── rule_1.go

In this structure, _bad0001 and _good0001 are folders for test cases, where _bad0001 contains non-compliant test cases and _good0001 contains compliant ones. expected.textproto specifies the expected test results, including the locations and content of the errors, written by the developer. The format is like:

results {
    path: "bad.cc"
    line_number: 6
    error_message: "NULL should not be used as an integer value"
}

During testing, if the actual result matches the expected result, the test passes. Note that even compliant test data needs an empty expected.textproto.

The libtooling folder contains the libtooling implementation for the rule. If there is no libtooling implementation, this folder will not exist.

rule_1.go contains the logic for calling the checker binary. It calls different tool runners from //cruleslib/runner/runner.go, such as RunLibtooling, RunCppcheck, etc., to specify the checker to be run.

rule_1_test.go contains the logic for go test testing, like:

func TestBad0001(t *testing.T) {
	tc := testcase.NewWithSystemHeader(t, "_bad0001")
	tc.ExpectOK(testlib.ToTestResult(Analyze(tc.Srcdir, tc.Options)))
}

ExpectOK means that the result of the Analyze function matches the content in expected.textproto, while ExpectFailure indicates a mismatch. NewWithSystemHeader is used to create a test case and add some system library paths.

Integrating into the Container Image

  1. Add toy_rules_deps and bigmain_toy_rules in //podman_image/bigmain/BUILD:
cc_library(
    name = "toy_rules_deps",
    deps = [
        "//toy_rules/rule_1/libtooling:rule_1_lib",
        # Additional custom rules can be added here
    ],
)
cc_binary(
    name = "bigmain_toy_rules",
    srcs = ["main.cc"],
    deps = [
        ":rule",
        ":toy_rules_deps",
        "//libtooling_includes:cmd_options",
        "@com_github_google_glog//:glog",
        "@com_google_absl//absl/strings:str_format",
    ],
)
  1. Add in //podman_image/bigmain_symlink:
mkdir /opt/naivesystems/toy_rules
ln -s /opt/naivesystems/bigmain /opt/naivesystems/toy_rules/rule_1
  1. Create a new file //podman_image/Containerfile.toyrules. If the new rule set only checks C language, create an image based on misrac. If it checks C++, create it based on misracpp, then copy bigmain_toy_rules into the image. If a Chinese image is needed, use dev instead of dev_en for base_tag.
ARG base_tag=dev
FROM naive.systems/analyzer/misracpp:${base_tag}
COPY "bigmain_toy_rules" "/opt/naivesystems/bigmain"
  1. Finally, add bigmain_toy_rules and build-toy-rules-en targets in //podman_image/Makefile to generate the image.

After these steps, run make build-toy-rules-en in //podman_image to obtain a container image containing the new rule set. In the .naivesystems/check_rules file of the project to be tested, adding toy_rules/rule_1 allows you to execute the following command for static code analysis using the generated image:

podman run -v $PWD:/src:O -v $PWD/.naivesystems:/config:Z \
  -v $PWD/output:/output:Z -w /src \
  naive.systems/analyzer/toyrules:dev_en \
  /opt/naivesystems/misra_analyzer -show_results -alsologtostderr

How to Write a New Checker

A checker requires the careful selection and implementation of a suitable tool, with the processing of its outcomes to produce a compliant resultsList.

Determining the Tool Based on the Problem Type

NaiveSystems Analyze is used to check if there are any violations in the project code, including but not limited to resource leaks, memory overflows, stack address escapes, etc. For example, the following code

long l = 100000;
int8_t i = 0;
i = l;

This code may have a problem with precision loss. Another example, for the following code

int i = 8 / 0;

This code has a division by zero error.

We generally divide the problems to be checked into two categories. One is STU (single translation unit), which are errors that can be detected within a single translation unit, like the precision loss mentioned above. The other is CTU (cross translation unit), which are errors that need to be checked across multiple translation units. For example, for a division by zero error, there might be a case like this:

// test.h
int getDiv(int a, int b);

// test.cc
int getDiv(int a, int b) {
    return a / b;
}

// main.cc
#include "test.h"

int main() {
    getDiv(10, 0);
    return 0;
}

This requires both test.cc and main.cc translation units to identify the error.

From another perspective, we can also divide problems into two types: those that can be directly identified on the AST and those that require deep analysis to resolve. For example, for the precision loss issue:

void test(void)
{
    long l = 100000;
    int i = 1;
    i = l;
}

We use Clang to dump out the AST

$ clang -Xclang -ast-dump -fsyntax-only test.c
`-FunctionDecl 0x1208e23e8 <test.c:1:1, line:6:1> line:1:6 test 'void (void)'
  `-CompoundStmt 0x1208e26e8 <line:2:1, line:6:1>
    |-DeclStmt 0x1208e2588 <line:3:5, col:20>
    | `-VarDecl 0x1208e24e8 <col:5, col:14> col:10 used l 'long' cinit
    |   `-ImplicitCastExpr 0x1208e2570 <col:14> 'long' <IntegralCast>
    |     `-IntegerLiteral 0x1208e2550 <col:14> 'int' 100000
    |-DeclStmt 0x1208e2640 <line:4:5, col:14>
    | `-VarDecl 0x1208e25b8 <col:5, col:13> col:9 used i 'int' cinit
    |   `-IntegerLiteral 0x1208e2620 <col:13> 'int' 1
    `-BinaryOperator 0x1208e26c8 <line:5:5, col:9> 'int' '='
      |-DeclRefExpr 0x1208e2658 <col:5> 'int' lvalue Var 0x1208e25b8 'i' 'int'
      `-ImplicitCastExpr 0x1208e26b0 <col:9> 'int' <IntegralCast>
        `-ImplicitCastExpr 0x1208e2698 <col:9> 'long' <LValueToRValue>
          `-DeclRefExpr 0x1208e2678 <col:9> 'long' lvalue Var 0x1208e24e8 'l' 'long'

From the AST, we can see that there are some ImplicitCastExpr from long to int, indicating a code violation. The main characteristic of this type of problem is that code violations can be directly observed from the code structure.

However, for zero-division problems, the divisor could be a result of complex calculation, like:

int d = 5;
int i = 10 / (d - d);

Here, we need to simply calculate the result of d-d to detect the error. For more complex cases, this calculation can be very intricate or might be a result obtained from another place, requiring deep analysis to resolve the issue.

For problems that can be directly concluded from the AST, we generally use tools like libtooling, cppcheck, ClangSema, ClangTidy, etc. For problems that require deep analysis to solve, we generally use CSA and Infer.

libtooling checker

libtooling is an official tool from Clang, which can be understood as a kind of Clang plugin. Users write AST Matchers to match the desired AST nodes, and then a predefined callback (Callback) function is automatically called for processing.

For instance, for MISRA C++:2008 4.10.1 NULL must not be used as an integer value, the code we write is as follows:

class Callback : public MatchFinder::MatchCallback {
 public:
  void Init(ResultsList* results_list, MatchFinder* finder) {
    results_list_ = results_list;
    finder->addMatcher(
        implicitCastExpr(hasSourceExpression(expr(gnuNullExpr())),
                         hasImplicitDestinationType(isInteger()),
                         unless(isExpansionInSystemHeader()))
            .bind("cast"),
        this);
  }

  void run(const MatchFinder::MatchResult& result) override {
    const Expr* expr = result.Nodes.getNodeAs<Expr>("cast");
    string error_message = "NULL must not be used as an integer value";
    string path = GetFilename(expr, result.SourceManager);
    int line = GetLine(expr, result.SourceManager);
    AddResultToResultsList(results_list_, path, line, error_message);
  }

 private:
  ResultsList* results_list_;
};

This code matches all implicitCastExpr from gnuNullExpr's sourceExpression to isInteger's implicitDestinationType, that are not isExpansionInSystemHeader. After a match is found, the callback function run is automatically called to report the error.

It is an STU type problem, so it is called using checker_integration.Libtooling_STU to specify the type:

runner.RunLibtooling(srcdir, "misra_cpp_2008/rule_4_10_1", checker_integration.Libtooling_STU, opts)

Full code reference at //misra_cpp_2008/rule_4_10_1.

For more details on how to write a libtooling checker, please refer to the official libtooling documentation and related tutorials.

About the structure of the libtooling folder, taking //toy_rules/rule_1 as an example:

rule_1
├── libtooling
│   ├── BUILD
│   ├── checker.cc
│   ├── checker.h
│   ├── lib.h
│   ├── main.cc
│   └── rule_1.cc
  • checker.h and checker.cc contain the specific implementation of the checker.
  • BUILD contains the Bazel definition.
  • main.cc includes the logic for calling the libtooling checker, parsing arguments, and writing the checker analysis results to a specified file.

Apart from the specific implementation logic of the checker, other parts are generally similar across all implementations. During development, you can run bazel build rule_1 in this folder to generate a callable binary.

cppcheck checker

cppcheck is an open-source tool that has been checked into our codebase: //third_party/cppcheck. After we compile, a binary named cppcheck is generated in this folder.

When we analyze code, we first use this binary to generate a dumpfile, which is a file information structure similar to the Clang AST. Finally, we use a script to analyze this dumpfile. For example, suppose we have code

#include <clocale> // Non-compliant

int main() {
    return 0;
}

First, generate a dumpfile

~/analyze/third_party/cppcheck/cppcheck --dump main.cpp

Then use the command

~/analyze/third_party/cppcheck/cppcheck --abspath --dump --std=c99 --dump-file=main.cpp.c99.dump main.cpp
~/analyze/third_party/cppcheck/addons/misra.py --check_rules=misra_c_2012/rule_15_1 --output_dir output bad1.cpp.c99.dump

To get the analysis results. These commands are generally used for development, but actual calls are wrapped in the runner:

runner.RunCppcheck(srcdir, "misra_c_2012/rule_15_1", checker_integration.Cppcheck_STU, opts)

It should be noted that cppcheck cannot generate a dumpfile for files with only directives. When writing test cases, an additional main function needs to be added.

Our cppcheck implementation is all in //third_party/cppcheck/addons/misra.py, and you can search for the name of the relevant rule (such as rule_15_1) to view the content of the corresponding checker.

Using //toy_rules/rule_2 as an example, some calling logic needs to be added in //third_party/cppcheck/addons/misra.py.

If toy_rules is a rule set that has never appeared in misra.py before, you need to:

  1. Create a new function executeToyRuleCheck to execute the checker.
    def executeToyRuleCheck(self, check_function, *args):
        check_function(*args)
  1. Create a new class ToyRuleResult to encapsulate the results of the checker.
class ToyRuleResult:
    def __init__(self, path, line_num, err_msg, other_locations = None):
        self.path = path
        self.line_number = line_num
        self.error_message = f'{err_msg}'
        self.locations = [ErrorLocation(path, line_num)]
        if other_locations is not None:
            for loc in other_locations:
                self.locations.append(ErrorLocation(loc.file, loc.linenr))
  1. Create a new function reportToyRuleError to add results to the JSON list and output results to stdout/stderr. The specific format of error_message can be defined according to your needs; here, only an error_id is passed.
    def reportToyRuleError(self, location, rule_num, other_locations = None):
        if self.settings.verify:
            self.verify_actual.append('%s:%d %d.%d.%d' % (location.file, location.linenr, rule_num))
        else:
            error_id = f"Rule-{rule_num}"
            toyrule_severity = 'Required'
            this_violation = '{}-{}-{}-{}'.format(location.file, location.linenr, location.column, rule_num)
            # If this is new violation then record it and show it. If not then
            # skip it since it has already been displayed.
            if not this_violation in self.existing_violations:
                self.existing_violations.add(this_violation)
                self.current_json_list.append(ToyRuleResult(location.file, location.linenr, error_id, other_locations))
                cppcheckdata.reportError(location, toyrule_severity, "", "toy", error_id)
                if toyrule_severity not in self.violations:
                    self.violations[toyrule_severity] = []
                self.violations[toyrule_severity].append('toy' + "-" + error_id)

The JSON list of results will be located in //toy_rules/rule_2/_bad0001/output/tmp/test_run/test_run-*/cppcheck_out.json and will look something like:

[
    {
        "path": "/home/username/analyze/toy_rules/rule_2/_bad0001/bad.c",
        "line_number": 3,
        "error_message": "Rule-2",
        "locations": [
            {
                "path": "/home/username/analyze/toy_rules/rule_2/_bad0001/bad.c",
                "line_number": 3
            }
        ]
    }
]

If the rule set already exists, add the implementation of a checker for a specific rule in the rule set:

  1. Add the implementation function toy_rule_2 (reference MISRA C:2012 15.1 - The goto statement should not be used).
    def toy_rule_2(self, data):
        for token in data.tokenlist:
            if token.str == "goto":
                self.reportToyRuleError(token, 2)
  1. Add the function toy_rule_2 to the parseDump list. Pass the required parameters, such as cfg, data.rawTokens, or dumpfile, based on the rule's requirements.
            if "toy_rules/rule_2" in rules_list or check_rules == "all":
                self.executeToyRuleCheck(self.toy_rule_2, cfg)

Afterward, the runner can be used to call the corresponding cppcheck implementation for toy_rules/rule_2 and check its output:

runner.RunCppcheck(srcdir, "toy_rules/rule_2", checker_integration.Cppcheck_STU, opts)

ClangSema Checker

ClangSema is a tool we use to check certain issues using Clang diagnostic flags. Typically, if the Clang plugin in VSCode indicates a problem in the code or if a warning is reported during compilation, for instance, "AUTOSAR A-5-3-3 Pointers to incomplete class types shall not be deleted," a warning like this will be generated during compilation:

bad1.cpp:6:13: warning: deleting pointer to incomplete type 'C::Impl' may cause undefined behavior [-Wdelete-incomplete]
            delete pimpl;
            ^      ~~~~~

We can directly use the -Wdelete-incomplete diagnostic flag to invoke ClangSema in Analyze, and check if the returned error contains related keywords:

results, err := runner.RunClangSema(srcdir, "-Wdelete-incomplete", opts)

The complete example is at //autosar/rule_A5_3_3.

The list of issues that can be checked by ClangSema is at DiagnosticsReference.

ClangTidy Checker

ClangTidy is another tool provided by Clang, implemented internally using libtooling. It essentially has some libtooling checkers pre-written, which we can use directly:

runner.RunClangTidy(srcdir, args, opts)

Here, args is a string list that can be used to check the same rule with multiple checkers, like in the example at //autosar/rule_A7_5_2.

The list of issues that can be checked by ClangTidy is available at https://clang.llvm.org/extra/clang-tidy/checks/list.html.

CSA Checker

Please read the CSA (Clang Static Analyzer) official documentation first. In summary, CSA is a path-sensitive interprocedural analysis tool based on symbolic execution technology, used to solve problems dependent on state, such as the aforementioned division by zero issue.

We have also checked CSA into our code repository. You can find the currently implemented checkers in //third_party/llvm-project/clang/lib/StaticAnalyzer/Checkers, some of which are native to Clang, while others with rule set names are our own implementation.

For example, in the DivZeroChecker, checkPreStmt is a callback function that CSA automatically calls every time it encounters a new Stmt. Here, we check if BinaryOperator is a division operation, and if so, get its rhs, the divisor. We use ConstraintManager to determine if there's a possibility of rhs being zero. ConstraintManager and CheckerContext store the state of variables and symbols, and report an error if there's a possibility of zero.

After implementing a specific checker, it can be invoked with -analyzer-checker, and multiple CSA checkers can be called simultaneously, separated by commas.

runner.RunCSA(srcdir, "-analyzer-checker=core.DivideZero", opts)

Relevant examples are in //autosar/rule_A5_6_1 and //autosar/rule_A0_4_4.

Infer Checker

Infer is a tool similar to CSA, and the list of issues it can check is at all-issue-types.

We invoke it using the corresponding issue type:

runner.RunInfer(srcdir, "--liveness", opts)

Relevant examples are in //misra_c_2012_crules/rule_2_2.

Go Checker

Some issues can be resolved simply by processing the files as strings, such as //autosar/rule_A13_6_1.

Other Tools

We have also integrated other tools, like analyzing based on errors or warnings from Clang (RunClangForErrorsOrWarnings), GCC (RunGCC), Cpplint (RunCpplint). For more details, refer to //cruleslib/runner/runner.go.

Combining Multiple Checkers

If there's only one checker, the resultsList generated by the runner typically meets the requirements. If multiple checkers are needed to check the same rule, simply combine the results from all checkers in the end. Related examples can be found in //misra_c_2012_crules/rule_2_2.