-
-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more checks for the validity of refnames #1672
Add more checks for the validity of refnames #1672
Conversation
This change adds checks based on the rules described in [0] in order to more robustly check a refname's validity. [0]: https://git-scm.com/docs/git-check-ref-format
To add a bit more context: I followed the general approach mentioned by @Byron and used by gitoxide, which is a For comparison, here's the naive approach, where the logic separation matches the docs (one rule per def _check_ref_name_valid_naive(ref_path: PathLike) -> None:
# Based on https://git-scm.com/docs/git-check-ref-format/
if any([component.startswith(".") or component.endswith(".lock") for component in ref_path.split("/")]):
raise ValueError(f"Invalid reference '{ref_path}': components cannot start with '.' or end with '.lock'")
elif ".." in str(ref_path):
raise ValueError(f"Invalid reference '{ref_path}': references cannot contain '..'")
elif any([ord(c) < 32 or ord(c) == 127 or c in [" ", "~", "^", ":"] for c in ref_path]):
raise ValueError(
f"Invalid reference '{ref_path}': references cannot contain ASCII control characters, spaces, tildes (~), carets (^) or colons (:)"
)
elif any([c in ["?", "*", "["] for c in ref_path]):
raise ValueError(
f"Invalid reference '{ref_path}': references cannot contain question marks (?), asterisks (*) or open brackets ([)"
)
elif ref_path.startswith("/") or ref_path.endswith("/") or "//" in ref_path:
raise ValueError(f"Invalid reference '{ref_path}': references cannot start or end with '/', or contain '//")
elif ref_path.endswith("."):
raise ValueError(f"Invalid reference '{ref_path}': references cannot end with '.'")
elif "@{" in ref_path:
raise ValueError(f"Invalid reference '{ref_path}': references cannot contain '@{{'")
elif ref_path == "@":
raise ValueError(f"Invalid reference '{ref_path}': references cannot be '@'")
elif "\\" in ref_path:
raise ValueError(f"Invalid reference '{ref_path}': references cannot contain '\\'") The naive approach is IMO more readable, but around half as fast as the one in the PR. Although, for reference, in my MacBook M1 Pro, for a refname 25 characters long:
So we are talking about minimal amounts either way. I'll leave the choice of which algorithm to use up to the maintainers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a million, I love this implementation!
Strangely enough, I find the faster version (the one here) more readable as well and would want to keep it for that reason alone.
There is one issue I see that might be hard to solve, but it's time to at least try. It's the general problem of how to interact with paths without running into decoding problems (i.e. Python tries to decode a path as decoding X, and fails, even though it's a valid filesystem path). Maybe @EliahKagan also has ideas regarding this topic.
# Based on the rules described in https://git-scm.com/docs/git-check-ref-format/#_description | ||
previous: Union[str, None] = None | ||
one_before_previous: Union[str, None] = None | ||
for c in str(ref_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to avoid converting to str
? I assume this tries to decode ref_path
with the current string encoding, which changes depending on the interpreter or user configuration and generally causes a lot of trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless this PR worsens that problem in some way, which I believe it does not, I would recommend it be fixed separately and later. The code this is replacing already had:
GitPython/git/refs/symbolic.py
Lines 171 to 172 in d40320b
if ".." in str(ref_path): | |
raise ValueError(f"Invalid reference '{ref_path}'") |
But actually even that neither introduced nor exacerbated the problem. From the commit prior to #1644 being merged:
GitPython/git/refs/symbolic.py
Lines 164 to 174 in 830025b
@classmethod | |
def _get_ref_info_helper( | |
cls, repo: "Repo", ref_path: Union[PathLike, None] | |
) -> Union[Tuple[str, None], Tuple[None, str]]: | |
"""Return: (str(sha), str(target_ref_path)) if available, the sha the file at | |
rela_path points to, or None. target_ref_path is the reference we | |
point to, or None""" | |
tokens: Union[None, List[str], Tuple[str, str]] = None | |
repodir = _git_dir(repo, ref_path) | |
try: | |
with open(os.path.join(repodir, str(ref_path)), "rt", encoding="UTF-8") as fp: |
Note how str(ref_path)
was passed to os.path.join
, which when given str
s returns a str
, thus a str
was being passed to open
. Note also that, while this str
call was actually redundant (os.path.join
accepts path-like objects since Python 3.6), even it was not the cause of str
and not bytes
being used. The annotation on ref_path
is Union[PathLike, None]
, where PathLike
is:
Line 43 in 830025b
PathLike = Union[str, "os.PathLike[str]"] |
Where both alternatives--str
and os.PathLike[str]
--represent text that has already been decoded.
So unless I'm missing something--which I admit I could be--I don't think it makes conceptual sense to do anything about that in this pull request. Furthermore, unless the judgment that CVE-2023-41040 was a security vulnerability was mistaken, or something about the variation explicated in #1644 (comment) is less exploitable, it seems to me that this pull request is fixing a vulnerability. Assuming that is the case, then I think this should avoid attempting to make far-reaching changes beyond those that pertain to the vulnerability, and that although reviewing these changes for correctness should not be rushed, other kinds of delays should be mostly avoided. With good regression tests included, as seems to be the case, the code could be improved on later in other ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the thorough assessment, I wholeheartedly agree.
The 'how to handle paths correctly' issue is definitely one of the big breaking points in GitPython, but maybe, for other reasons, this wasn't ever a problem here.
Knowing this is on your radar, maybe one day there will be a solution to it. gitoxide
already solves this problem, but it's easier when you have an actual type system and a standard library that makes you aware every step of the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the ultimate goal to support both str
-based and bytes
-based ref names and paths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is correctness, and it's vital that one doesn't try to decode paths to fit some interpreter-controlled encoding. Paths are paths, and if you are lucky, they can be turned into bytes. On Unix, that's always possible and a no-op, but on windows it may require a conversion. It's just the question how these things are supposed to work in python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this relate (conceptually, I mean) to the issue in rust-lang/rust#12056?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A great find :) - yes, that's absolutely related. gitoxide
internally handles git-paths as bundles of bytes without known encoding, and just like git
, it assumes at least ASCII. Conversions do happen but they all go through gix-path
to have a central place for it.
Doing something like it would be needed here as well, even though I argue that before that happens universally, there should be some clear definition of what GitPython is supposed to be.
When I took it over by contributing massively, just like you do now, I needed more control for the use-case I had in mind, and started implementing all these sloppy pure-python components that don't even get the basics right. With that I turned GitPython into some strange hybrid which I think didn't do it any good besides maybe being a little faster for some usecases. After all, manipulating an index in memory has advantages, but there are also other ways to do it while relying on git
entirely.
Maybe this is thinking a step too far, but I strongly believe that the true benefit of GitPython is to be able to call git
in a simple manner and to be compliant naturally due to using git
directly. This should be its identity.
But then again, it's worth recognizing that changing the various pure-python implementations to us git
under the hood probably isn't possible in a non-breaking way.
Another avenue would be to try and get the APIs to use types that don't suffer from encoding/decoding issues related to Paths, and then one day make the jump to replacing the pure-python implementations with the python bindings of gitoxide
.
It looks like CI improved and now that the PR was merged, it failed CI due to a lint: https://github.com/gitpython-developers/GitPython/actions/runs/6271134895/job/17030195508#step:4:122 . A quick fix will be appreciated. Edit: I quickly fixed it myself - it seems like sometimes I forget that I am still able to edit text, despite it being python. |
@Byron Because the forthcoming 3.1.37 release that will include this patch will be a security fix, either the existing advisory/CVE should be updated with a correction (both a note and the version change), or a new CVE should be created for the variant of the vulnerability reported at #1644 (comment). I am not sure which of those things should be done here. Usually I would lean toward regarding such things as new bugs meriting new advisories/CVEs, which is also what I see more often. But I do not know that that's the best approach here, because the variant of the exploit where an absolute (or otherwise non-relative) path is used does seem to match the description in the summary section of CVE-2023-41040 even though it doesn't resemble any of the examples. To be clear, I don't mean that this situation is necessarily ambiguous, but instead that I do not have the knowledge and experience to know how it ought to be handled. Either way, this need not delay the release, of course. (Sorry if you're already on top of the CVE/advisory matter and this comment is just noise.) |
Thanks for the hint, it's appreciated! I think it's fair to say that I am not on top of CVEs and that I have no intention to be - even though this sounds harsh it's just the current reality. But thus far members of the community picked up the necessary work around CVEs which I definitely appreciate if this would keep happening. |
A new release was created: https://pypi.org/project/GitPython/3.1.37/ |
Given the 3.1.37 release title ("3.1.37 - a proper fix CVE-2023-41040") I'm thinking this is intuitively being regarded as fixing the originally reported vulnerability, so perhaps that advisory should be updated, rather than a new one created? I am still not sure. As noted in #1638 (comment), you (@Byron) could update the local advisory. If you do, a PR could then also be opened on https://github.com/github/advisory-database (where github/advisory-database#2690 was opened) to change the global advisory accordingly. I don't know if there's anything else that would need to be done. @stsewd Do you have any opinion about what ought to be done here? Would you have any objection to the local advisory being edited this way? Would you instead prefer that this variant, where an absolute path is used, be regarded as a related but separate vulnerability altogether? (I know @Byron can edit the advisory, but I wanted to check in case you had an opinion on this.) One source of my hesitancy here is that I think a new CVE may still be needed in this kind of situation. That seems common (courtesy of this SO answer). |
A good point - I am still getting used to advisories and the local ones are indeed editable. So that one has been adjusted. I kindly ask somebody else to create a PR for the global database though - it seems GitHub makes it hard/impossible to the use web interface for that.
To me, CVEs are good to create a far-reaching 'ping' to users of GitPython. Some might see it earlier than the new release. To me it's the question on how much time one wants to spend to create such a ping, and judging from the CVE's I have seen, it's quite expensive. |
Thanks, but I'm not actually sure if it was a good point. Maybe a new advisory ought to have been, or ought to be, created. I really don't know the proper thing to do here. |
If there is an uproar because of how this was handled, it will be possible to undo changes to the local CVE and create a new one. So I think nothing is lost, and I think it's OK to chose less expensive options in dealing with this. |
Hi, a new CVE/advisory is usually created for this type of situation, and in the description you can put something like "this was a due to an incomplete fix of [link to the other CVE]". I don't oppose to edit the current one, but I guess editing doesn't have the same "ping to everyone to upgrade" effect as a new one. |
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [GitPython](https://togithub.com/gitpython-developers/GitPython) | `==3.1.36` -> `==3.1.37` | [![age](https://developer.mend.io/api/mc/badges/age/pypi/GitPython/3.1.37?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/GitPython/3.1.37?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/GitPython/3.1.36/3.1.37?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/GitPython/3.1.36/3.1.37?slim=true)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>gitpython-developers/GitPython (GitPython)</summary> ### [`v3.1.37`](https://togithub.com/gitpython-developers/GitPython/releases/tag/3.1.37): - a proper fix CVE-2023-41040 [Compare Source](https://togithub.com/gitpython-developers/GitPython/compare/3.1.36...3.1.37) #### What's Changed - Improve Python version and OS compatibility, fixing deprecations by [@​EliahKagan](https://togithub.com/EliahKagan) in [https://github.com/gitpython-developers/GitPython/pull/1654](https://togithub.com/gitpython-developers/GitPython/pull/1654) - Better document env_case test/fixture and cwd by [@​EliahKagan](https://togithub.com/EliahKagan) in [https://github.com/gitpython-developers/GitPython/pull/1657](https://togithub.com/gitpython-developers/GitPython/pull/1657) - Remove spurious executable permissions by [@​EliahKagan](https://togithub.com/EliahKagan) in [https://github.com/gitpython-developers/GitPython/pull/1658](https://togithub.com/gitpython-developers/GitPython/pull/1658) - Fix up checks in Makefile and make them portable by [@​EliahKagan](https://togithub.com/EliahKagan) in [https://github.com/gitpython-developers/GitPython/pull/1661](https://togithub.com/gitpython-developers/GitPython/pull/1661) - Fix URLs that were redirecting to another license by [@​EliahKagan](https://togithub.com/EliahKagan) in [https://github.com/gitpython-developers/GitPython/pull/1662](https://togithub.com/gitpython-developers/GitPython/pull/1662) - Assorted small fixes/improvements to root dir docs by [@​EliahKagan](https://togithub.com/EliahKagan) in [https://github.com/gitpython-developers/GitPython/pull/1663](https://togithub.com/gitpython-developers/GitPython/pull/1663) - Use venv instead of virtualenv in test_installation by [@​EliahKagan](https://togithub.com/EliahKagan) in [https://github.com/gitpython-developers/GitPython/pull/1664](https://togithub.com/gitpython-developers/GitPython/pull/1664) - Omit py_modules in setup by [@​EliahKagan](https://togithub.com/EliahKagan) in [https://github.com/gitpython-developers/GitPython/pull/1665](https://togithub.com/gitpython-developers/GitPython/pull/1665) - Don't track code coverage temporary files by [@​EliahKagan](https://togithub.com/EliahKagan) in [https://github.com/gitpython-developers/GitPython/pull/1666](https://togithub.com/gitpython-developers/GitPython/pull/1666) - Configure tox by [@​EliahKagan](https://togithub.com/EliahKagan) in [https://github.com/gitpython-developers/GitPython/pull/1667](https://togithub.com/gitpython-developers/GitPython/pull/1667) - Format tests with black and auto-exclude untracked paths by [@​EliahKagan](https://togithub.com/EliahKagan) in [https://github.com/gitpython-developers/GitPython/pull/1668](https://togithub.com/gitpython-developers/GitPython/pull/1668) - Upgrade and broaden flake8, fixing style problems and bugs by [@​EliahKagan](https://togithub.com/EliahKagan) in [https://github.com/gitpython-developers/GitPython/pull/1673](https://togithub.com/gitpython-developers/GitPython/pull/1673) - Fix rollback bug in SymbolicReference.set_reference by [@​EliahKagan](https://togithub.com/EliahKagan) in [https://github.com/gitpython-developers/GitPython/pull/1675](https://togithub.com/gitpython-developers/GitPython/pull/1675) - Remove `@NoEffect` annotations by [@​EliahKagan](https://togithub.com/EliahKagan) in [https://github.com/gitpython-developers/GitPython/pull/1677](https://togithub.com/gitpython-developers/GitPython/pull/1677) - Add more checks for the validity of refnames by [@​facutuesca](https://togithub.com/facutuesca) in [https://github.com/gitpython-developers/GitPython/pull/1672](https://togithub.com/gitpython-developers/GitPython/pull/1672) **Full Changelog**: gitpython-developers/GitPython@3.1.36...3.1.37 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/allenporter/flux-local). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNi45Ny4xIiwidXBkYXRlZEluVmVyIjoiMzYuOTcuMSIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Thanks! @Byron Based on this, and also what I am now seeing is the recent history of this practice being followed for GitPython in CVE-2022-24439/CVE-2023-40267, I recommend making a new advisory. Maybe there is some way I can help with this? However, if for any reason you would still prefer this route not be taken, then I can definitely go ahead and open a PR to update the global advisory with the version change. (I am unsure if that would cause Dependabot to notify users of the security update or not, but I imagine that, if it would not, then a reviewer on the PR would mention that.)
I have three ideas of what I could do, but I don't know what, if any of them, would help or be wanted. This depends, in part, on what takes up the time for you.
|
Let's try something: updating version numbers is much cheaper than creating a new 'follow-up' CVE, for all sides actually. One could ask in the PR of the version change if notifications will be sent, and if unknown, @stsewd could probably help to tell as well. If no notification is sent, you could create a new CVE - you would be able to do this here in GitPython and from there it can be elevated, along with requesting a global CVE for it - this is easily done through the maintainer interface. The rest we can take from there should it come to that. |
Sounds good; I will do this. I noticed in the local advisory that, while
I'm making the PR through the structured "Suggest improvements" template, in which every field is pretty specific. I'll either include it somewhere if it fits, or otherwise try and add it into the created PR or add a comment with it.
Thanks for telling me about that. That is much nicer than the particular specific I had suggested might be used. Of course, I'll still save that for if the above proves insufficient, as you say. |
Thanks for the head's up, that's an oversight that is now corrected. And thanks again for your help! |
No problem! I've submitted the proposed edit to the global advisory in PR github/advisory-database#2753. |
github/advisory-database#2753 has been merged and the global GitHub advisory for CVE-2023-41040 has thus been updated. |
Further update: The change to the global advisory has caused Dependabot security alerts to be raised, as desired. For example, dmvassallo/EmbeddingScratchwork#248 is a PR opened automatically to resolve a new Dependabot security alert in a project where GitPython had already been upgraded to the previously listed version. Note that this does not necessarily apply for tools that are less closely coupled to the GitHub ecosystem, and I don't know, for example, if any new Renovatebot PRs will be generated. |
Bump gitpython from 3.1.35 to 3.1.37 Bumps gitpython from 3.1.35 to 3.1.37. Release notes Sourced from gitpython's releases. 3.1.37 - a proper fix CVE-2023-41040 What's Changed Improve Python version and OS compatibility, fixing deprecations by @EliahKagan in gitpython-developers/GitPython#1654 Better document env_case test/fixture and cwd by @EliahKagan in gitpython-developers/GitPython#1657 Remove spurious executable permissions by @EliahKagan in gitpython-developers/GitPython#1658 Fix up checks in Makefile and make them portable by @EliahKagan in gitpython-developers/GitPython#1661 Fix URLs that were redirecting to another license by @EliahKagan in gitpython-developers/GitPython#1662 Assorted small fixes/improvements to root dir docs by @EliahKagan in gitpython-developers/GitPython#1663 Use venv instead of virtualenv in test_installation by @EliahKagan in gitpython-developers/GitPython#1664 Omit py_modules in setup by @EliahKagan in gitpython-developers/GitPython#1665 Don't track code coverage temporary files by @EliahKagan in gitpython-developers/GitPython#1666 Configure tox by @EliahKagan in gitpython-developers/GitPython#1667 Format tests with black and auto-exclude untracked paths by @EliahKagan in gitpython-developers/GitPython#1668 Upgrade and broaden flake8, fixing style problems and bugs by @EliahKagan in gitpython-developers/GitPython#1673 Fix rollback bug in SymbolicReference.set_reference by @EliahKagan in gitpython-developers/GitPython#1675 Remove @NoEffect annotations by @EliahKagan in gitpython-developers/GitPython#1677 Add more checks for the validity of refnames by @facutuesca in gitpython-developers/GitPython#1672 Full Changelog: gitpython-developers/[email protected] Commits b27a89f fix makefile to compare commit hashes only 0bd2890 prepare next release 832b6ee remove unnecessary list comprehension to fix CI e98f57b Merge pull request #1672 from trail-of-forks/robust-refname-checks 1774f1e Merge pull request #1677 from EliahKagan/no-noeffect a4701a0 Remove @NoEffect annotations d40320b Merge pull request #1675 from EliahKagan/rollback d1c1f31 Merge pull request #1673 from EliahKagan/flake8 e480985 Tweak rollback logic in log.to_file ff84b26 Refactor try-finally cleanup in git/ Additional commits viewable in compare view Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase. Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: @dependabot rebase will rebase this PR @dependabot recreate will recreate this PR, overwriting any edits that have been made to it @dependabot merge will merge this PR after your CI passes on it @dependabot squash and merge will squash and merge this PR after your CI passes on it @dependabot cancel merge will cancel a previously requested merge and block automerging @dependabot reopen will reopen this PR if it is closed @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the Security Alerts page. Reviewed-by: Vladimir Vshivkov
This change adds checks based on the rules described in the docs in order to more robustly check a refname's validity.