Feature request: Add something like `must_match_file` in addition to `md5sum` #179

stianlagstad · 2023-06-27T13:34:04Z

Thank you very much for this valuable tool.

I'd like to propose to add new functionality to compare a file against a stored, known-to-be-correct file. The assertion could be named something like must_match_file. In many cases it's enough for me to use md5sum to verify whether or not a result file has changed at all. However, when I do see that something has changed it would be helpful to see what exactly changed in the output of the test failure. If an assertion like must_match_file existed, then I could be shown how the new result differs from the stored result. Kind of similar to "golden testing", ref https://ro-che.info/articles/2017-12-04-golden-tests.

The text was updated successfully, but these errors were encountered:

rhpvorderman · 2023-07-26T07:43:59Z

There was a PR for a diff lately: #175

We had a whole discussion about it. I am curious what your thoughts are.

stianlagstad · 2023-08-14T12:39:37Z

Thanks for responding! I just read the discussion in #175, and now I understand a bit more about the pros and cons. What I'm doing locally right now is using this helper function:

def diff_files(
    file1: pathlib.Path,
    file2: pathlib.Path,
    ignore_lines_matching: Optional[str] = None,
) -> None:
    if ignore_lines_matching is not None:
        res = subprocess.Popen(
            ["diff", "-I", ignore_lines_matching, file1, file2], stdout=subprocess.PIPE
        )
    else:
        res = subprocess.Popen(["diff", file1, file2], stdout=subprocess.PIPE)
    output = res.communicate()[0]
    if res.returncode != 0:
        output_str = str(output, "UTF-8")
        print(output_str)
        raise Exception(
            f"Found a difference between {file1=} and {file2=}, where none were"
            " expected."
        )

and using that in tests like this:

@pytest.mark.workflow("Test mymodule")
def test_mymodule_results_file_diff(workflow_dir: str) -> None:
    # The file that we've produced in this test run:
    result_file: pathlib.Path = pathlib.Path(
        workflow_dir,
        "results.vcf",
    )
    # The file stored as the golden truth:
    stored_file: pathlib.Path = pathlib.Path(
        "some_path/results.vcf",
    )

    # Do a diff between the files
    diff_files(result_file, stored_file, ignore_lines_matching="fileDate")

Doing this is good enough for my purposes right now. It'll show the diff output and fail the test if there's an unexpected difference there.

I'll keep this open if OK for you, and I'll keep pondering on the idea (as well as the discussion in #175 ).

rhpvorderman · 2023-12-05T07:22:16Z

Sorry for my late reply.

I'll keep this open if OK for you, and I'll keep pondering on the idea (as well as the discussion in #175 ).

Yes, and thanks for thinking about that. Elegantly displaying non-matching files in a way that is useful is indeed a good addition, but a careful weighing of complexity vs the amount of use cases is also necessary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Add something like `must_match_file` in addition to `md5sum` #179

Feature request: Add something like `must_match_file` in addition to `md5sum` #179

stianlagstad commented Jun 27, 2023

rhpvorderman commented Jul 26, 2023

stianlagstad commented Aug 14, 2023

rhpvorderman commented Dec 5, 2023

Feature request: Add something like must_match_file in addition to md5sum #179

Feature request: Add something like must_match_file in addition to md5sum #179

Comments

stianlagstad commented Jun 27, 2023

rhpvorderman commented Jul 26, 2023

stianlagstad commented Aug 14, 2023

rhpvorderman commented Dec 5, 2023

Feature request: Add something like `must_match_file` in addition to `md5sum` #179

Feature request: Add something like `must_match_file` in addition to `md5sum` #179