Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add something like must_match_file in addition to md5sum #179

Open
stianlagstad opened this issue Jun 27, 2023 · 3 comments

Comments

@stianlagstad
Copy link

Thank you very much for this valuable tool.

I'd like to propose to add new functionality to compare a file against a stored, known-to-be-correct file. The assertion could be named something like must_match_file. In many cases it's enough for me to use md5sum to verify whether or not a result file has changed at all. However, when I do see that something has changed it would be helpful to see what exactly changed in the output of the test failure. If an assertion like must_match_file existed, then I could be shown how the new result differs from the stored result. Kind of similar to "golden testing", ref https://ro-che.info/articles/2017-12-04-golden-tests.

@rhpvorderman
Copy link
Member

There was a PR for a diff lately: #175

We had a whole discussion about it. I am curious what your thoughts are.

@stianlagstad
Copy link
Author

Thanks for responding! I just read the discussion in #175, and now I understand a bit more about the pros and cons. What I'm doing locally right now is using this helper function:

def diff_files(
    file1: pathlib.Path,
    file2: pathlib.Path,
    ignore_lines_matching: Optional[str] = None,
) -> None:
    if ignore_lines_matching is not None:
        res = subprocess.Popen(
            ["diff", "-I", ignore_lines_matching, file1, file2], stdout=subprocess.PIPE
        )
    else:
        res = subprocess.Popen(["diff", file1, file2], stdout=subprocess.PIPE)
    output = res.communicate()[0]
    if res.returncode != 0:
        output_str = str(output, "UTF-8")
        print(output_str)
        raise Exception(
            f"Found a difference between {file1=} and {file2=}, where none were"
            " expected."
        )

and using that in tests like this:

@pytest.mark.workflow("Test mymodule")
def test_mymodule_results_file_diff(workflow_dir: str) -> None:
    # The file that we've produced in this test run:
    result_file: pathlib.Path = pathlib.Path(
        workflow_dir,
        "results.vcf",
    )
    # The file stored as the golden truth:
    stored_file: pathlib.Path = pathlib.Path(
        "some_path/results.vcf",
    )

    # Do a diff between the files
    diff_files(result_file, stored_file, ignore_lines_matching="fileDate")

Doing this is good enough for my purposes right now. It'll show the diff output and fail the test if there's an unexpected difference there.

I'll keep this open if OK for you, and I'll keep pondering on the idea (as well as the discussion in #175 ).

@rhpvorderman
Copy link
Member

Sorry for my late reply.

I'll keep this open if OK for you, and I'll keep pondering on the idea (as well as the discussion in #175 ).

Yes, and thanks for thinking about that. Elegantly displaying non-matching files in a way that is useful is indeed a good addition, but a careful weighing of complexity vs the amount of use cases is also necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants