Description
Warning messages are most often written to standard error. Furthermore, warnings from GitPython, when they are displayed, are in most cases displayed that way unless the caller arranges otherwise, because they are displayed by calling loggers' warning
methods or, for DeprecationWarning
s, by calling warnings.warn
.
However, the warning Git.refresh
displays when running a test command like git --version
(with the command being tested used for git
) fails, and GIT_PYTHON_REFRESH
is set to 1
, w
, warn
, or warning
, is different. It is printed to standard output. It also does not use any logging facility, so attempts to capture it by configuring Python logging will not succeed.
Lines 453 to 454 in d28c20b
If there is a reason for this behavior, then I think it should be commented or otherwise stated. Otherwise, I am not sure what to do. It may still not be feasible to change, because programs that use GitPython may be relying on the warning being printed to standard output, or on it not being printed to standard error. But if he behavior is going to change in the next major version, then that could be commented and/or this issue could be kept open until then. If the reason is unknown and it is undecided whether the behavior should be changed in the next major release, then a comment could be added just saying that it prints to standard output for backward compatibility.
Activity
Byron commentedon Jan 24, 2024
Thanks for bringing this up!
To my mind, libraries have no business with the stdout/stderr of a program, which makes this usage (and those like it) as bug. If other programs rely on the output of a library, they rely on an unstable API without any guarantees.
This is an opening for using the logging facility here instead.
Maybe in relation to #1806, if the logging would be configured to show warnings by default as it is suggested to well-behaved users of the logging module, then this important message should still show up even if GitPython is only used in an ad-hoc fashion. I am sure this was also the reason it prints to stdout, it was such a common issue that it should be assured the users can see it even without any prior configuration to the logging module.
EliahKagan commentedon Jan 24, 2024
I agree.
I think the change for that should just be to no longer prevent that from happening, by no longer registering any
NullHandler
, since doing so is no longer needed. Whether it happens or not would then depend on whether, and how, logging has already been configured at the time thegit
module is imported. If messages of levelWARNING
are for some reason being ignored, such as due to configuration of the root logger (higher than any GitPython loggers), then it would not be seen. I don't think such suppression is common, but logging might be configured in a way that doesn't show messages in a terminal. So long as that's okay, I think this is a good change to make.There is also the question of what logging level should be used for this message. It's described as a warning, currently, but if it really represents an error then it can be logged with the
ERROR
level. The description in the logging levels table forERROR
is, "Due to a more serious problem, the software has not been able to perform some function." Intuitively, that seems right. But I'm not sure it really is, because one way to characterize the situation at that point would be to say that the software hasn't tried and failed to do anything it needs to do; instead, it's tried something it doesn't need to do yet, and the failure is only a prediction of something that might be an error later. Then again, this is a condition that is treated as an error by default and an exception raised, so maybe theERROR
level is the right choice. Then again, the warning is when an environment variable has been set to not treat it as an error...I don't see why that would be, since writing to stderr has never required that logging be configured. It's true that doing so with
print
varied between Python 2 and Python 3, both of which GitPython supported at the time this code was added in a962464 (#640), iffrom __future__ import print_function
was not used:I don't know if that's sufficient to explain why it wasn't done, though.
Byron commentedon Jan 25, 2024
Yes, that's absolutely OK and I would appreciate that. If folks override the logger in their application, they know what they are doing. Those who use
python -m git
would still see these warnings by default, which is what I would hope for.I agree, and this would make it even more likely to actually show up even if the logging is configured at application level. Since this is configurable, maybe that means the message level should be adjusted depending on the GitPython configuration.
Apologies for the confusion. What I tried to say is that in order to avoid that the message might not show up as logging is disabled, it was emitted to stdout or stderr directly. Maybe it's just an oversight, the true reason might be lost in the tides of time.
EliahKagan commentedon Jan 26, 2024
I agree. Common approaches when not doing anything special with logging are to not configure logging or to configure it with
logging.basicConfig
. Either way, if the level is not set explicitly anywhere and nothing is done to suppress messages,WARNING
and higher are shown.(The messages are formatted more nicely when logging is configured, including with
basicConfig
, than when logging is not configured and the handler of last resort ends up displaying messages. But either way, these days, when nothing is done to suppress messages or control what severities are seen, it isWARNING
and higher.)Do you just mean importing and using the
git
module?git
is a package (it's a directory) and it does not contain__main__.py
, so running it as a script, aspython -m
does, gives an error that it can't be directly executed.I ask because one nit I plan to fix at the same time as some of this other logging-related stuff is to make
logging.getLogger
calls consistent across modules in GitPython where inconsistency appears unintentional. Currently, most pass their fully qualified module name by writing__name__
, but a few write it out, e.g.,git/objects/commit.py
containslogging.getLogger("git.objects.commit")
instead oflogging.getLogger(__name__)
. These work the same, with__name__
being preferred, so long as the modules are imported rather than being run as scripts. This is not directly affected by whether the top-levelgit
module is runnable as a script, but if any modules inside thegit/
directory are intended to be runnable as scripts, then I should be aware of that while making that change to use__name__
.Some further considerations:
If a user or application developer adds a
NullHandler
to thegit
logger to restore the old behavior that will be changed for NullHandler added to all but git.repo.base logger #1806, this will also prevent messages (of any severity) logged to thegit
orgit.cmd
logger. So a message fromGit.refresh
, of any severity, would also be suppressed. Thus, by trying to restore one old behavior, a user/developer could end up making another behavior even more different. Then again, suppressing logging probably should suppress such messages. (It would not be suppressed if logged to the root logger instead, but a library should not log to the root logger.)GitPython currently logs some messages of severity
ERROR
. Even a user or application developer takes the approach of configuring logging but setting the logging level (i.e., minimum severity) higher for GitPython loggers, there would still be no easy way to suppress all messages except the one fromGit.refresh
. If the message fromGit.refresh
were changed toCRITICAL
then that would provide an easy way to see only it, since GitPython does not currently use that level for any messages. But I think that may not be justified, especially considering that thegit
module may end up being imported by an application or other library that rarely uses its functionality or that uses it in a fairly unimportant way.Currently the recognized values of
GIT_PYTHON_REFRESH
, which are included in messages, characterize the message as a warning:GitPython/git/cmd.py
Line 420 in 307f198
I think it is fine to add new values, including synonyms of existing values, as well as to change the wording of the message, including how it describes the options. But I think all those values should remain supported. It seems to me that the way they are presented effectively documents them and it would be a breaking change to stop recognizing them. I also think it is valuable to be able to set environment variables in a way that causes any version of GitPython (excluding very old versions) to be configured in the same way. Because values like
warn
should continue to be recognized, this means that users may be surprised if the level is higher. I'm not sure how much of a problem that really is.Should the level of the message itself be configurable with an additional, newly recognized
GIT_PYTHON_*
environment variable? This could mitigate some of the above issues somewhat. My feeling is that the benefit would not be worth the added complexity, and that discoverability would be low unless the message mentions that environment variable too, which it shouldn't since then the message would be overcomplicated and confusing. But I am not sure.Ah, I see. The two of us were focusing on different things there. I was specifically worried about why it was going to standard output rather than standard error, in case that behavior had at some point been needed.
Edit: Fixed a misspelling of
GIT_PYTHON_REFRESH
, which I had written in one place asGIT_REFRESH
.EliahKagan commentedon Jan 26, 2024
Can you elaborate on this?
(I've posted this as its own comment since it seems like the issue may be especially important, and I don't want you to feel pressured to respond to my various thoughts above, or to do so at the same time, if this is the priority.)
Have initial refresh use a logger to warn
Byron commentedon Jan 26, 2024
I used this example as I know this imports
git
(GitPython) and runs its initialization, which should include checking for agit
binary, but maybe I am mistaken. It was my hope that doing so would print a message, like it might have done before.If anything in the
git
module is usable without a known to be working Git executable, then leaving the severity withERROR
might be OK. If meaningful operation isn't required, it could probably even beCRITICAL
. Maybe this can be going along with the severity configuration that already is supported, by adding another level to make that configurable.I agree that this configuration should keep working, and maybe I am wrong to assume that this can integrate nicely with logging severities.
I see, each message comes with a severity, and we would be talking about increasing the severity of this message in particular, which is hard-coded. Indeed, I wouldn't want this to get anymore complicated and don't think users should get one more way to configure this.
I hope this is explained above, and also think that once the messaging of
git.refresh
and friends is using logging after #1813 is merged, it will be alright (or can be made to be alright).EliahKagan commentedon Jan 26, 2024
You are correct and I was simply mistaken here. My mind was already so much on the issue of what modules if any in GitPython might be runnable as a script (relevant to 987dbf4), I didn't think to check my belief that the check for
__main__.py
precedes running the top-level module code of__init__.py
. That belief of mine turns out to be false (and also didn't really make sense, because almost everything in Python is dynamic, and if__init__.py
didn't run first then it wouldn't be able to patch in__main__
or customize the import system before__main__
is searched, in the rare case that one would want to do this). Runningpython -m git
is indeed an effective way to exercise the first-refresh behavior of GitPython; it andpython -c 'import git'
are equally good. Sorry about the confusion!I'll try to reply to the rest soon, but I wanted to get this out since it addresses an inaccuracy in what I had said above.
EliahKagan commentedon Jan 28, 2024
If no new values of
GIT_PYTHON_REFRESH
are added and no additional environment variable is consulted, then I think the question is whether the level should reflect the fact thatGIT_PYTHON_REFRESH
was set to not raise an exception and take that to mean the situation is less severe. Thegit.refresh
function itself is usable whengit
is not found, after all.I think on reflection that it's at least an
ERROR
because, unlike with aWARNING
, GitPython may not be expected to continue working. I'm not sure aboutCRITICAL
. But ifGIT_PYTHON_REFRESH
is to be understood as indicating only how the problem should be reported, rather than as indicating how severe the problem is, thenCRITICAL
may make sense.The reason I'm not sure how well they would work is that the current recognized options that can be set via
GIT_PYTHON_REFRESH
have both symbolic and numerical values, and logging levels have numerical as well as symbolic values, and the numerical values are different; they are small numbers forGIT_PYTHON_REFRESH
currently (which should continue to be supported) and numbers like 10, 20, and so forth, for logging levels.One way to get around that is to add another environment variable. But then we have another environment variable, have to decide what to do when it is unset as well as whether or not and how to report when its values makes no sense, and complexity for the user may be increased if this information is all presented in a message.
Other than the above issue with how it would be specified, I don't think the level/severity needs to be hard-coded. The
log
method can be used with a dynamically determined level; I don't think there's anything inherently wrong with doing that.Byron commentedon Jan 28, 2024
Maybe you have an idea or preference on how this issue can be resolved by using a logger to report the matter in the most suitable way. I'd love to go with that :).
EliahKagan commentedon Jan 28, 2024
When working on #1813, I wrote dc62096, which I didn't include there because, as noted there, I think it should have tests, which should build on the test changes in #1812. Both #1812 and #1813 are merged (and they will be able to run on Cygwin, since #1814), so those tests can be written and proposed.
So I am thinking I'll go with an approach like in dc62096 initially, but with tests, and possibly with
ERROR
changed toCRITICAL
depending on the outcome of a quick-ish search for how projects tend to decide between the two levels in ambiguous circumstances.Then if a release ends up happening between that and future changes to this feature, future changes might be constrained by the need to avoid breakage or confusion. But even then, it could still probably be built on further.
Delays in implementing this have not been due to design questions here, but instead the CI problems discovered in #1813 and worked around in #1814. I may still not get to this immediately, mostly for reasons not related (not even indirectly) to GitPython, but in part because I agree with the idea that the cause and exact conditions of the problem worked around in #1814 should be further investigated.
With all that said, I do have another idea for how the logging level could be specified. I don't know if it's really a good idea, and it's not part of my initial plan as described above.
The
:
followed by a logging level would always be optional. It would be ignored unless the part before the:
is 1, w, etc.The case that could be confusing is:
Byron commentedon Jan 29, 2024
Thanks for sharing - I am already looking forward to what might cause these hangs deeply under the hood of C-Python apparently.
Event though I think this could work while being backward compatible, some UX issues notwithstanding, my question is if there could ever be user-side demand for this. Maybe the demand appears with awareness though, which can be provided of course as part of the printed messages. In any case, I am open to that for when the time comes.
Test current initial-refresh behavior
Change warning refresh-mode tests to expect logging
Have initial refresh use a logger to warn