SCANPY-248 Auto detect test code by guillaume-dequenne · Pull Request #318 · SonarSource/sonar-scanner-python

guillaume-dequenne · 2026-05-08T12:24:24Z

No description provided.

hashicorp-vault-sonar-prod · 2026-05-08T12:24:43Z

sonar-review-alpha · 2026-05-08T12:24:53Z

Summary

This PR adds automatic test directory detection for Python projects. The feature infers sonar.tests configuration by checking multiple sources in order: pyproject.toml [tool.pytest.ini_options] testpaths, pytest.ini, tox.ini, setup.cfg, and—as a fallback—filesystem conventions (directories named "tests", "test", or "testing"). The detection only runs if sonar.tests hasn't been explicitly set by the user in higher-priority sources (CLI args, environment variables, or config files), avoiding unwanted overrides. This eliminates manual configuration for users following Python testing conventions.

What reviewers should know

Core logic: The python_project_loader.py module implements a chain of loaders that try each config source in order. Each loader returns None if the config file is absent, an empty string if testpaths is declared but empty or invalid, or a comma-separated path string if valid directories are found. Only the filesystem fallback runs if no config file declared testpaths.

Key decision: _existing_paths() filters out non-existent directories and handles absolute paths (including Windows drive-relative paths) carefully. This prevents silent failures where users declare invalid paths.

Configuration integration: In configuration_loader.py, the auto-detection runs last—after all higher-priority sources—and only if SONAR_TESTS is not already set. This respects user intent.

Test coverage: Comprehensive unit tests (test_python_project_loader.py) cover each config source, multiline/list formats, non-existent paths, malformed files, and edge cases. Integration tests (test_auto_detect_tests.py) verify end-to-end functionality with SonarQube, confirming test files are correctly classified.

Minor setup: .gitignore adds .sonar/, and pyproject.toml configures norecursedirs to prevent pytest from scanning test source directories during development.

Generate Walkthrough
Generate Diagram

🗣️ Give feedback

sonarqube-next · 2026-05-12T11:50:25Z

Quality Gate passed for 'Python Scanner'

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
0 Dependency risks
94.9% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

sonar-review-alpha

Solid implementation overall — the chain-of-responsibility pattern is clean, the test coverage is thorough, and the integration with the existing configuration loader is correct. One real bug needs fixing before merge.

The previously flagged get_project_test_files pagination issue is still open.

🗣️ Give feedback

sonar-review-alpha · 2026-05-12T11:53:26Z

+        if result is None:
+            continue  # file absent or no testpaths key — try next source
+        if result:
+            return {SONAR_TESTS: result}


Bug: The three-way return distinguishes None (file absent / no key) from "" (key declared, all paths invalid) from a non-empty string (success), but the caller in load() checks if result: which collapses None and "" into the same branch only on the truthy/falsy axis. The logic actually works today because of the preceding if result is None: continue, but the empty-string sentinel is a fragile contract.

More importantly: _existing_paths silently skips paths that are outside the project root (returns them filtered out), so if all declared paths are outside the root, paths is [] and the function falls into the warning + return "" branch — correct. But if _existing_paths raises an unexpected exception (e.g. PermissionError from is_dir()), the outer try/except in _load_from_pyproject_toml catches it and returns None instead of "", letting the chain fall through. This was flagged before and acknowledged, but the sentinel design amplifies the impact — the caller cannot distinguish between the two cases.

No code change required for the acknowledged filesystem I/O issue, but the return "" sentinel deserves a comment explaining why "" and not None (it's used to stop the chain when testpaths was declared but all paths were invalid).

Mark as noise

sonar-review-alpha · 2026-05-12T11:53:26Z

+        logging.debug(f"Error reading pyproject.toml for pytest testpaths: {e}")
+        return None
+    testpaths = toml_dict.get("tool", {}).get("pytest", {}).get("ini_options", {}).get("testpaths")
+    if not testpaths:


The testpaths value from pyproject.toml can be a string (space-separated) in addition to a list, per pytest docs. The current code handles that via testpaths.split() when it's not a list. However, if testpaths is an integer or another scalar type in the TOML (e.g. testpaths = 1), str(p).strip() will coerce it silently rather than skipping it, and _existing_paths will then look for a directory named "1". This is a minor edge case but worth guarding:

if isinstance(testpaths, list): raw = [str(p) for p in testpaths if str(p).strip()] elif isinstance(testpaths, str): raw = [p for p in testpaths.split() if p] else: logging.debug(f"Unexpected type for testpaths in pyproject.toml: {type(testpaths)}") return None

Mark as noise

sonar-review-alpha · 2026-05-12T11:53:26Z

+        resp.raise_for_status()
+        return resp.json().get("task", {})
+
+    def wait_for_ce_task_by_id(self, task_id: str) -> None:


Logic duplication: wait_for_ce_task_by_id polls with MAX_RETRIES * 10 iterations and time.sleep(2) between them. The existing wait_for_analysis_completion method (not shown in the diff but referenced in the fallback path) almost certainly uses a similar polling loop. If the retry budget, sleep interval, or terminal-state set ever needs to change, both loops must be updated. Extract the shared polling logic into a private _poll_until helper, or at minimum define the sleep duration and terminal states as class-level constants so both methods share them.

Mark as noise

Seppli11

Overall, the cahnge looks good. It's really cool to see that we're finally parsing settings from the pyproject.toml file and co. (Hopefully the python version is next 🤞 )

One thing I'm not sure about is if it makes sense to also have a heuristic based on the folders in the scanner. It seems to create quite a few strange places where, IMO, the outcome isn't that straight forward.

The other, more nitpicky thing, python_project_loader doesn't seem like an appropriate name for this module, given how it is used and how closely tight it usage is to the present of the test property.

Let me know if you want to discuss anything

Seppli11 · 2026-05-12T14:20:06Z

+        # Running python_project_loader unconditionally would emit confusing warnings about
+        # pytest config even when the result would be discarded.
+        if SONAR_TESTS not in resolved_properties:
+            resolved_properties.update(python_project_loader.load(base_dir))


In a more holistic view, applying the properties from the python_project_loader only if sonar.tests is not set doesn't really make sense to me. If we for example extend the python_project_loader to also detect the python version, things become weird.

Maybe it could make sense renaming the module to python_project_test_source_loader or similar (maybe you find a catchier name 😅 )

Seppli11 · 2026-05-12T15:06:21Z

+    return _load_from_ini_file(base_dir, "setup.cfg", _SETUP_CFG_PYTEST_SECTION)
+
+
+def _load_from_filesystem(base_dir: pathlib.Path) -> Optional[str]:


Does it make sense to add this heuristic here as well, given that sonar-python itself has a more extensive set of heuristics.

I think the heuristic from sonar-python is a bit of a crutch in the sense that it will only disable rules and still report metrics based on the FileType, because we can't contradict the scanner for the file type classification.
This will reduce noise but ultimately still produce an "inconsistent" analysis.

Having it here makes the analysis more consistent because the rule execution is in sync with the metrics reporting. It's an argument to use pysonar instead of the generic sonar-scanner-cli because here, the manual setup of sonar.tests is truly optional (assuming what is inferred matches the user expectations).

That is a good point. Maybe it would make sense to only fall back to the heuristic if the user hasn't set anything, rather than we haven't found a valid test folder

Seppli11 · 2026-05-12T15:10:21Z

+        self._wait_for_ce_completion(workdir)
        return process
+
+    def _wait_for_ce_completion(self, workdir: pathlib.Path) -> None:


This method does a lot of automatic checks and fallbacks. Is this really necessary for a test?

Seppli11 · 2026-05-12T15:14:43Z

+        resp.raise_for_status()
+        return resp.json()
+
+    def get_latest_ce_task(self) -> Optional[dict]:


this seems to be never called. Same applies to search_projects and get_project_measures

Seppli11 · 2026-05-12T15:24:43Z

+            testpaths = []
+            """,
+        )
+        result = python_project_loader.load(Path("."))


If we keep the heuristic, this could lead to a weird behavior where the pyrpoject.tol loader returned an empty set, while the file heuristic loader found a folder. This would now override what the user set

Seppli11 · 2026-05-12T15:27:41Z

+        self.fs.create_dir("tests")
+        # nonexistent/ is not on disk — only tests/ should be returned
+        result = python_project_loader.load(Path("."))
+        self.assertEqual(result[SONAR_TESTS], "tests")


A similar issue could appear here, where for whatever reason, all the user specified tests are not valid paths. If the heuristic picks them up, we're overriding what the user explicitly told us.

Seppli11 · 2026-05-12T15:28:11Z

+    def test_load_from_pytest_ini(self):
+        self.fs.create_file(
+            "pytest.ini",
+            contents="[pytest]\ntestpaths = tests integration\n",


nitpick: multiline strings would be nicer

hashicorp-vault-sonar-prod Bot changed the title ~~Auto detect test code~~ SCANPY-248 Auto detect test code May 8, 2026