nondeterministic diagnostics across repeated runs on an unchanged workspace

emmylua_check produces nondeterministic diagnostics across repeated runs on an unchanged workspace

Environment:
 - emmylua_check 0.23.1
 - Linux
 - text output mode
 - same workspace, same config, no file changes between runs

Running emmylua_check repeatedly on an unchanged workspace produces different warning counts and different per-file diagnostic sections.

Example from three consecutive runs on the same tree:

 - run 1: 70 warnings, 2 hints
 - run 2: 72 warnings, 2 hints
 - run 3: 63 warnings, 2 hints

This is not just output ordering. Entire diagnostic sections for some files appear/disappear between runs.

Expected behavior: identical diagnostics on repeated runs against the same workspace/config
Actual behavior: warning counts change between runs diagnostics for whole files can appear in one run and disappear in the next

How to reproduce:

Run emmylua_check multiple times against the same unchanged workspace.

Compare the outputs.
Example:
```
for i in 1 2 3; do
  emmylua_check -c .emmyrc.json src > "run-$i.txt" 2>&1 || true
done

sha256sum run-*.txt
diff -u run-1.txt run-2.txt
diff -u run-2.txt run-3.txt
```

What I already ruled out:

 - this is not caused by my wrapper script; the wrapper only execs:
```
emmylua_check -c .emmyrc.json src
```
 - the workspace was unchanged between runs
- setting TOKIO_WORKER_THREADS=1 reduced the spread but did not eliminate it

Why I suspect an internal race/order-dependence:

emmylua_check diagnoses files concurrently from a shared analysis object:

```
for file_id in need_check_files.clone() {
    let sender = sender.clone();
    let analysis = analysis.clone();
    tokio::spawn(async move {
        let cancel_token = CancellationToken::new();
        let diagnostics = analysis.diagnose_file(file_id, cancel_token);
        sender.send((file_id, diagnostics)).await.unwrap();
    });
}
```

So each file is diagnosed in its own task, all sharing the same Arc<analysis>. The observed behavior looks like either:

 - a race in shared lazy caches, or
 - order-dependent analysis results across files

It would be useful to know whether diagnose_file is intended to be safely parallel on a shared analysis instance, or whether this should be serialized.

A `--jobs 1` / serial-diagnosis mode would also help as a workaround if full determinism is not guaranteed today.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nondeterministic diagnostics across repeated runs on an unchanged workspace #1091

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

nondeterministic diagnostics across repeated runs on an unchanged workspace #1091

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions