Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ensure variable finding works properly and add unit tests for variable finding #5535

Merged
merged 7 commits into from
Nov 14, 2024

Conversation

beckermr
Copy link
Contributor

@beckermr beckermr commented Nov 13, 2024

Description

Currently, conda-build will say the variant variable python is used for a statement like

requirements:
  host:
    - python {{ python_min }}

This is incorrect and only python_min is used in this case AFAIK.

It also says this statement does NOT use python when it does

requirements:
  run:
    - {{ pin_compatible('python') }}

This bug was introduced in #2838.

The issue appears to be a buggy regex for searching for variants.

This PR proposes a new regex and adds unit tests that assert the correct behavior.

Checklist - did you ...

  • Add a file to the news directory (using the template) for the next release's release notes?
  • Add / update necessary tests?
  • Add / update outdated documentation?

@conda-bot conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Nov 13, 2024
@beckermr beckermr marked this pull request as ready for review November 13, 2024 05:41
@beckermr beckermr requested a review from a team as a code owner November 13, 2024 05:41
Copy link

codspeed-hq bot commented Nov 13, 2024

CodSpeed Performance Report

Merging #5535 will not alter performance

Comparing beckermr:find-vars-unit-tests (6e833ea) with main (98ece01)

Summary

✅ 5 untouched benchmarks

@beckermr
Copy link
Contributor Author

@conda/builds-tools @jezdez @kenodegard @beeankha @jaimergp This request is unusual, but can we consider merging this PR before the next conda-build release?

This PR fixes a very non-trivial bug that has been present in the code base for at least five years. conda-build is misdetecting which variants are being used, which is not great. :/

@jaimergp
Copy link
Contributor

Hm, I wonder if this is behind conda-forge/conda-smithy#2100 and conda-forge/conda-smithy#2071

@beckermr
Copy link
Contributor Author

Looks like it very likely could be. Currently conda-build will confuse the pin_compatible statement for napari-base with using napari.

conda_build/variants.py Outdated Show resolved Hide resolved
@jaimergp
Copy link
Contributor

And side note, I think these regexes (which have received several contributions lately) would benefit from re.VERBOSE. Maybe that helps with the review too.

@beckermr
Copy link
Contributor Author

beckermr commented Nov 13, 2024

That is a good idea, but I'd like to defer to another PR. Also, I am not sure I could actually describe what it is doing. I use regex101.com to build and test regex strings because they are so confusing. :/

@jaimergp
Copy link
Contributor

I use regex101.com to build and test regex strings

Heh, same here but with regexr.com 😂

That is a good idea, but I'd like to defer to another PR.

Let me give it a try, but yes, non-blocking.

@jaimergp
Copy link
Contributor

jaimergp commented Nov 13, 2024

Hm, let's see:

variant_regex = re.compile(
    # This regex matches stuff in Jinja expressions and functions; e.g. {{ python }} and {{ pin_compatible('python') }}
    rf"""
    \{{\{{              # Double curly brace opens Ninja expression
        \s*             # Zero or more spaces
        (?:             # Opens non-capturing group
            pin_[a-z]+  # Matches any lowercase pin_* function name (e.g. pin_compatible)
            \(          # Matches the opening parenthesis of the pin_* function
                \s*?    # Non-greedy glob for any number of spaces (JRG: not sure why we need the non greedy version)
                ['"]    # Single or double quote, required once
        )?              # Closes non-capturing group and makes it optional; i.e. the whole pin_compatible(' part can be ommitted 
        {v_regex}       # The regex from above, should match a variant name
        [^_0-9a-zA-Z]   # One instance of any character that is NOT a number, a letter or an underscore; i.e. not part of the variant name
        .*?             # Any number of any character, in a non-greedy way; aka "whatever comes before the closing curly braces"
    \}}\}}              # Double curly brace closes Ninja expression
    """,
    re.VERBOSE
)

selector_regex = re.compile(
    # This regex matches selectors; e.g. # [py>38]
    rf"""
    ^                   # Starts with... (one thing, I don't know why 🎵)
    [^#\[]*?            # Any character that is not a # or [, any number of times, non-greedy
    \#?                 # A hashtag, optionally
    \s                  # A space
    \[                  # An opening square bracket
        [^\]]*?         # Any character that is not a ], any number of times, non-greedy
        (?<!            # Negative lookbehind assertion ✨ aka "NOT preceded by..."
            [_\w\d]     # ... underscores, letters or digits
        )               # Closes the negative lookbehind assertion
        {v_regex}       # The variant name regex
    [                   # One of
        =\s<>!\]        # equals, spaces, angle brackets, exclamation mark or closing square bracket
                        # ... These are all characters that "terminate" a variant name, like operators 
                        # ... or the end of the selector
    ] 
    """,
    re.VERBOSE
)

@beckermr
Copy link
Contributor Author

@jaimergp my concern is that the conda-build test suite is not robust enough to catch bugs that might be introduced when we change the regex strings to the more verbose style. It'd be better to defer to future PRs that change the strings and introduce more robust tests to ensure nothing breaks.

@jaimergp
Copy link
Contributor

It'd be better to defer to future PRs that change the strings and introduce more robust tests to ensure nothing breaks.

That's ok with me. Just updated my comment because I was already "parsing" the regexes for the review, so might as well write it down.

@@ -749,7 +749,9 @@ def find_used_variables_in_text(variant, recipe_text, selectors_only=False):
continue
v_regex = re.escape(v)
v_req_regex = "[-_]".join(map(re.escape, v.split("_")))
variant_regex = rf"\{{\s*(?:pin_[a-z]+\(\s*?['\"])?{v_regex}[^'\"]*?\}}\}}"
variant_regex = (
rf"\{{\s*(?:pin_[a-z]+\(\s*?['\"])?{v_regex}[^_0-9a-zA-Z].*?\}}\}}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the change:

  • The previous regex was non-greedy-matching any characters after the variant name that were NOT a single or double quote. That meant that partial matches were possible (e.g. it found pari in pin_compatible('napari').
  • The new regex tries to non-greedy-match any character after the variant name that is NOT a valid variant name character (underscores, numbers and letters).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. We assert the variable name is terminated by a non-allowed character for variables. Then we get the rest of the characters up until the jinja2 statement is closed, asserting it is closed.

@beckermr
Copy link
Contributor Author

shall we merge @jaimergp? no pressure, but I am contemplating adding a patch to the conda-build feedstock if this PR is going to sit for a while.

@jaimergp
Copy link
Contributor

Let's give @beeankha and @ForgottenProgramme a few hours to take a look. The release branch has not been created yet so there's no rush to merge just yet. It does LGTM though.

@jezdez jezdez added this to the 24.11.x milestone Nov 13, 2024
@jezdez jezdez merged commit 41856b5 into conda:main Nov 14, 2024
28 checks passed
@beckermr beckermr deleted the find-vars-unit-tests branch November 14, 2024 17:49
@beeankha beeankha mentioned this pull request Nov 15, 2024
29 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed [bot] added once the contributor has signed the CLA
Projects
Status: 🏁 Done
Development

Successfully merging this pull request may close these issues.

6 participants