Skip to content

nondeterministic diagnostics across repeated runs on an unchanged workspace #1091

@splitice

Description

@splitice

emmylua_check produces nondeterministic diagnostics across repeated runs on an unchanged workspace

Environment:

  • emmylua_check 0.23.1
  • Linux
  • text output mode
  • same workspace, same config, no file changes between runs

Running emmylua_check repeatedly on an unchanged workspace produces different warning counts and different per-file diagnostic sections.

Example from three consecutive runs on the same tree:

  • run 1: 70 warnings, 2 hints
  • run 2: 72 warnings, 2 hints
  • run 3: 63 warnings, 2 hints

This is not just output ordering. Entire diagnostic sections for some files appear/disappear between runs.

Expected behavior: identical diagnostics on repeated runs against the same workspace/config
Actual behavior: warning counts change between runs diagnostics for whole files can appear in one run and disappear in the next

How to reproduce:

Run emmylua_check multiple times against the same unchanged workspace.

Compare the outputs.
Example:

for i in 1 2 3; do
  emmylua_check -c .emmyrc.json src > "run-$i.txt" 2>&1 || true
done

sha256sum run-*.txt
diff -u run-1.txt run-2.txt
diff -u run-2.txt run-3.txt

What I already ruled out:

  • this is not caused by my wrapper script; the wrapper only execs:
emmylua_check -c .emmyrc.json src
  • the workspace was unchanged between runs
  • setting TOKIO_WORKER_THREADS=1 reduced the spread but did not eliminate it

Why I suspect an internal race/order-dependence:

emmylua_check diagnoses files concurrently from a shared analysis object:

for file_id in need_check_files.clone() {
    let sender = sender.clone();
    let analysis = analysis.clone();
    tokio::spawn(async move {
        let cancel_token = CancellationToken::new();
        let diagnostics = analysis.diagnose_file(file_id, cancel_token);
        sender.send((file_id, diagnostics)).await.unwrap();
    });
}

So each file is diagnosed in its own task, all sharing the same Arc. The observed behavior looks like either:

  • a race in shared lazy caches, or
  • order-dependent analysis results across files

It would be useful to know whether diagnose_file is intended to be safely parallel on a shared analysis instance, or whether this should be serialized.

A --jobs 1 / serial-diagnosis mode would also help as a workaround if full determinism is not guaranteed today.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions