Skip to content

Fix working directory reuse bugs and use persistent cache dir by default#82

Open
RadekManak wants to merge 3 commits intoopenshift-eng:mainfrom
RadekManak:fix-rebase-workdir-contamination
Open

Fix working directory reuse bugs and use persistent cache dir by default#82
RadekManak wants to merge 3 commits intoopenshift-eng:mainfrom
RadekManak:fix-rebase-workdir-contamination

Conversation

@RadekManak
Copy link
Contributor

Three fixes for _init_working_dir() related to working directory reuse:

  • Reinitialize .git when source URL changes: When rebasebot reuses the
    working directory (the default) with a different source repository, stale
    tags and branches from the previous source accumulate in the local git
    object store. These corrupt ancestry checks, commit filtering, and
    cherry-pick detection. If the source remote URL has changed since the last
    run, .git is now deleted and reinitialized before fetching, ensuring clean
    refs.

  • Scope credential.username per repository URL: The global
    credential.username = x-access-token config was applied to all remotes
    including source, causing git to attempt authentication with no password
    when fetching from a public source repository. The username entry is now
    written only into the per-URL credential sections for dest and rebase.

  • Use XDG-aware persistent cache directory by default: The previous default
    of .rebase was relative to the current working directory, meaning the cache
    location would vary depending on where rebasebot was invoked from. The new
    default is $XDG_CACHE_HOME/rebasebot (falling back to ~/.cache/rebasebot),
    which is stable, user-specific, and outside the working directory.

Fixes: #79
Fixes: #36

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 27, 2026
@RadekManak RadekManak force-pushed the fix-rebase-workdir-contamination branch from f27175c to 0eb5dc9 Compare February 27, 2026 14:10
A source URL switch can leave stale refs in the existing .git directory and corrupt ancestry checks during rebase, so recreate git metadata to ensure clean refs.
Set credential.username under each repository-specific credential section so credential config no longer leaks globally across remotes.
Switch the CLI default from `.rebase` to an XDG-aware cache location, treat empty XDG_CACHE_HOME as unset, and add tests/documentation for the new behavior.
@RadekManak RadekManak force-pushed the fix-rebase-workdir-contamination branch from 0eb5dc9 to 2ec3e43 Compare February 27, 2026 14:50
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 27, 2026

@kaovilai: changing LGTM is restricted to collaborators

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 27, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaovilai, RadekManak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@RadekManak
Copy link
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Mar 11, 2026

✅ Actions performed

Full review triggered.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 11, 2026
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link

coderabbitai bot commented Mar 11, 2026

Walkthrough

Updated rebasebot to use a persistent cache directory (e.g., $XDG_CACHE_HOME/rebasebot or ~/.cache/rebasebot) as the default working directory instead of .rebase in the current directory. Modified credential configuration from global to per-repository scope. Updated tests to use monkeypatch for directory changes.

Changes

Cohort / File(s) Summary
Documentation
README.md
Updated to reflect the new default working directory as a persistent cache location outside the current working directory.
Credential and Repository Initialization
rebasebot/bot.py
Added shutil import. Modified _init_working_dir to detect and handle source remote URL changes by removing and reinitializing the .git directory. Changed credential configuration from global credential.username to per-repository credentials using the credential "<url>" syntax for dest and rebase remotes.
CLI and Working Directory Management
rebasebot/cli.py
Introduced _default_working_dir() helper to resolve the default working directory using XDG_CACHE_HOME or ~/.cache/rebasebot. Modified rebasebot_run to compute and use the working directory with proper current working directory (CWD) management via a try/finally block to ensure CWD restoration.
Test Updates
tests/test_bot.py, tests/test_cli.py
Updated test method signatures to include the monkeypatch fixture. Replaced direct os.chdir() calls with monkeypatch.chdir() in test cases. Extended CLI tests to validate working directory defaulting logic and end-to-end behavior with XDG_CACHE_HOME resolution. Updated the valid_args_dict fixture to use tmp_path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Assessment against linked issues

Objective Addressed Explanation
Move credential configuration from global to per-repository scope [#79]
Use persistent cache directory instead of .rebase in current working directory [#36]
Handle source remote URL changes to prevent stale refs [#79]
Remove filter="blob:none" parameter or add + prefix to refspecs [#79] The raw summary does not explicitly mention modifications to fetch calls, filter parameters, or refspec prefixes in _init_working_dir. Unclear whether these changes from issue #79 are included in this PR.

Out-of-scope changes

Code Change Explanation
Test fixture parameterization for valid_args_dict (tests/test_cli.py) The fixture now depends on tmp_path parameter, extending its scope beyond the working directory feature to include temporary directory test isolation. While beneficial, this test infrastructure change is not directly tied to the stated objectives of #36 or #79.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
rebasebot/bot.py (1)

555-559: Please add direct _init_working_dir() regression tests.

The two highest-risk changes in this PR live here: repo-scoped credential sections and .git reinitialization on source changes. The added tests only cover CLI working-dir resolution, so these behaviors are still unpinned.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rebasebot/bot.py` around lines 555 - 559, Add direct regression tests that
call the _init_working_dir() function to cover repo-scoped credential sections
and .git reinitialization on source changes: write unit tests that (1) create a
temporary repo, set a credential section like f'credential "{repo}"' and verify
config.set_value results in correct credential entry when _init_working_dir()
runs, and (2) simulate changing the source (e.g., different remote URL or empty
.git) and assert that _init_working_dir() reinitializes the .git directory/state
as expected; place tests to invoke rebasebot.bot._init_working_dir() directly
(not via CLI) so these behaviors are pinned.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rebasebot/cli.py`:
- Around line 44-47: The function _default_working_dir currently accepts a
relative XDG_CACHE_HOME; update it to reject non-absolute XDG_CACHE_HOME per the
XDG spec by reading os.environ.get("XDG_CACHE_HOME") into cache_home_raw,
checking if cache_home_raw and os.path.isabs(cache_home_raw) are True, and only
then use cache_home = cache_home_raw; otherwise fall back to
os.path.join(os.path.expanduser("~"), ".cache"); finally return
os.path.join(cache_home, "rebasebot"). Ensure you reference the
_default_working_dir function and the XDG_CACHE_HOME env var in the change.

---

Nitpick comments:
In `@rebasebot/bot.py`:
- Around line 555-559: Add direct regression tests that call the
_init_working_dir() function to cover repo-scoped credential sections and .git
reinitialization on source changes: write unit tests that (1) create a temporary
repo, set a credential section like f'credential "{repo}"' and verify
config.set_value results in correct credential entry when _init_working_dir()
runs, and (2) simulate changing the source (e.g., different remote URL or empty
.git) and assert that _init_working_dir() reinitializes the .git directory/state
as expected; place tests to invoke rebasebot.bot._init_working_dir() directly
(not via CLI) so these behaviors are pinned.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 02f62e80-d513-43d5-97b5-c2f1bee8786d

📥 Commits

Reviewing files that changed from the base of the PR and between 6f0004b and 2ec3e43.

📒 Files selected for processing (5)
  • README.md
  • rebasebot/bot.py
  • rebasebot/cli.py
  • tests/test_bot.py
  • tests/test_cli.py

Comment on lines +522 to +536
# If the source URL changed, stale refs from the previous source repo remain
# in the local git store and can corrupt rebase operations (wrong ancestry
# checks, wrong commit filtering, wrong cherry-pick detection). Reinitializing
# .git is the only safe way to clear them.
if "source" in gitwd.remotes and gitwd.remotes["source"].url != source.url:
logging.warning(
"Source URL changed from %s to %s; reinitializing working directory "
"to remove stale refs",
gitwd.remotes["source"].url,
source.url,
)
git_dir = gitwd.git_dir
gitwd.close()
shutil.rmtree(git_dir)
gitwd = git.Repo.init(path=workdir)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "*.py" -path "*/rebasebot/*" | head -20

Repository: openshift-eng/rebasebot

Length of output: 181


🏁 Script executed:

cd rebasebot && wc -l bot.py

Repository: openshift-eng/rebasebot

Length of output: 77


🏁 Script executed:

sed -n '520,560p' rebasebot/bot.py

Repository: openshift-eng/rebasebot

Length of output: 1546


🏁 Script executed:

# Search for fetch-related code in bot.py
rg -n "fetch" rebasebot/bot.py -A 3 -B 1

Repository: openshift-eng/rebasebot

Length of output: 1899


🏁 Script executed:

# Look for filter="blob:none" references
rg -n "filter" rebasebot/bot.py -B 2 -A 2

Repository: openshift-eng/rebasebot

Length of output: 1191


🏁 Script executed:

# Check for git version checks or compatibility notes
rg -n "git.*2\.52|version|compatibility" rebasebot/bot.py -i

Repository: openshift-eng/rebasebot

Length of output: 145


🏁 Script executed:

# Search for test files related to working directory initialization
find . -name "*test*.py" -type f | xargs rg -l "init_working_dir" 2>/dev/null

Repository: openshift-eng/rebasebot

Length of output: 90


🏁 Script executed:

# Check the issue `#79` context if mentioned in comments or docs
find . -name "*.md" -o -name "*.txt" -o -name "*.rst" | xargs rg -l "#79|issue.*79" 2>/dev/null | head -5

Repository: openshift-eng/rebasebot

Length of output: 49


🏁 Script executed:

# Examine the fetch calls and their parameters in detail
sed -n '409p;572p;573p;576p;579p;582p;610p' rebasebot/bot.py

Repository: openshift-eng/rebasebot

Length of output: 455


🏁 Script executed:

# Look at the test file to see coverage of reused workdir scenarios
head -100 tests/test_rebases.py

Repository: openshift-eng/rebasebot

Length of output: 3167


🏁 Script executed:

# Search for any git version compatibility or 2.52 issues mentioned
rg -n "2\.52|forced|force.*fetch|--force" rebasebot/ tests/ -B 2 -A 2

Repository: openshift-eng/rebasebot

Length of output: 49


🏁 Script executed:

# Check for any documentation or comments about the git 2.52 issue
find . -name "*.md" -o -name "*.txt" -o -name "*.rst" | xargs cat 2>/dev/null | rg -i "git.*2\.52|blob:none" -B 3 -A 3 | head -50

Repository: openshift-eng/rebasebot

Length of output: 49


🏁 Script executed:

# Look for git version checks in the codebase
rg -n "version|compatibility" rebasebot/ -i | grep -i git | head -20

Repository: openshift-eng/rebasebot

Length of output: 283


🏁 Script executed:

# Check git module to understand fetch parameters
python3 << 'EOF'
# Check GitPython documentation for fetch() behavior
import inspect
try:
    from git import Repo
    # Print the fetch method signature if available
    help(Repo.remotes.__class__)
except:
    print("GitPython not available in this environment")
EOF

Repository: openshift-eng/rebasebot

Length of output: 110


🏁 Script executed:

# Search for PR or issue references in commits or documentation
git log --all --oneline | grep -i "#79\|issue.*79" | head -10

Repository: openshift-eng/rebasebot

Length of output: 49


🏁 Script executed:

# Look for any comments or context about the reused workdir issue
rg -n "reused|workdir.*reuse|stale.*ref|ref.*stale" rebasebot/ -i -B 2 -A 2

Repository: openshift-eng/rebasebot

Length of output: 771


🏁 Script executed:

# Check if there are any environment variables or configuration for git behavior
rg -n "GIT_|git.*config|git.*option" rebasebot/ -B 1 -A 1

Repository: openshift-eng/rebasebot

Length of output: 1389


🏁 Script executed:

# Search for the actual issue or PR comments in markdown files or docs
find . -type f \( -name "*.md" -o -name "*.txt" -o -name "CHANGELOG*" -o -name "*.rst" \) -exec cat {} \; 2>/dev/null | head -500

Repository: openshift-eng/rebasebot

Length of output: 20153


🏁 Script executed:

# Look more carefully at how remotes are used in rebase operations
sed -n '570,620p' rebasebot/bot.py

Repository: openshift-eng/rebasebot

Length of output: 2268


🏁 Script executed:

# Check if there are any comments about forced updates or ref handling
rg -n "forced\|force\|ref.*contaminat\|stale" rebasebot/bot.py -i

Repository: openshift-eng/rebasebot

Length of output: 49


🏁 Script executed:

# Look at the entire _init_working_dir function to understand the complete flow
sed -n '500,650p' rebasebot/bot.py

Repository: openshift-eng/rebasebot

Length of output: 5675


The reused-workdir fix is incomplete and does not address known compatibility issues.

The reinit logic (lines 522–536) only handles the "source URL changed" case. When remote refs are rewritten in the same repository without a URL change, stale refs can still corrupt the working directory. Additionally, the fetch calls at lines 579 and 582 unconditionally use filter="blob:none", but the project's own AGENTS.md documents that issue #79 explicitly identified this filter as problematic on git 2.52—yet no version check or conditional handling exists in the code to avoid this on incompatible git versions.

Test coverage for reused workdir scenarios with ref rewrites and git 2.52 compatibility is absent.

Comment on lines +44 to +47
def _default_working_dir() -> str:
cache_home = os.environ.get("XDG_CACHE_HOME") or os.path.join(
os.path.expanduser("~"), ".cache")
return os.path.join(cache_home, "rebasebot")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

❓ Verification inconclusive

Script executed:

cat -n rebasebot/cli.py | sed -n '40,50p'

Repository: openshift-eng/rebasebot


Repository: openshift-eng/rebasebot
Exit code: 0

stdout:

    40	        except ValueError as e:
    41	            parser.error(str(e))
    42	
    43	
    44	def _default_working_dir() -> str:
    45	    cache_home = os.environ.get("XDG_CACHE_HOME") or os.path.join(
    46	        os.path.expanduser("~"), ".cache")
    47	    return os.path.join(cache_home, "rebasebot")
    48	
    49	
    50	# parse_cli_arguments parses command line arguments using argparse and returns

XDG Base Directory specification XDG_CACHE_HOME absolute path requirement


Yes. The XDG Base Directory Specification states that all paths set in the XDG environment variables—including XDG_CACHE_HOMEmust be absolute; if an implementation encounters a relative path, it should treat it as invalid and ignore it. [1]

Also, if XDG_CACHE_HOME is unset or empty, the specified default is $HOME/.cache. [1]

Source: XDG Base Directory Specification v0.8 (May 8, 2021). [1]


Reject relative XDG_CACHE_HOME values per XDG Base Directory Specification.

The XDG Base Directory Specification (v0.8) requires that all XDG environment variables, including XDG_CACHE_HOME, be absolute paths; relative paths must be treated as invalid and ignored, falling back to the default $HOME/.cache. The current code accepts relative paths unchecked, which would resolve them relative to the caller's current working directory—reintroducing the persistence problem this change aims to eliminate.

Proposed fix
 def _default_working_dir() -> str:
-    cache_home = os.environ.get("XDG_CACHE_HOME") or os.path.join(
-        os.path.expanduser("~"), ".cache")
+    cache_home = os.environ.get("XDG_CACHE_HOME")
+    if not cache_home or not os.path.isabs(cache_home):
+        cache_home = os.path.join(os.path.expanduser("~"), ".cache")
     return os.path.join(cache_home, "rebasebot")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rebasebot/cli.py` around lines 44 - 47, The function _default_working_dir
currently accepts a relative XDG_CACHE_HOME; update it to reject non-absolute
XDG_CACHE_HOME per the XDG spec by reading os.environ.get("XDG_CACHE_HOME") into
cache_home_raw, checking if cache_home_raw and os.path.isabs(cache_home_raw) are
True, and only then use cache_home = cache_home_raw; otherwise fall back to
os.path.join(os.path.expanduser("~"), ".cache"); finally return
os.path.join(cache_home, "rebasebot"). Ensure you reference the
_default_working_dir function and the XDG_CACHE_HOME env var in the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Global credential.username and filter=blob:none break local fetches on git 2.52 Working directories are not removed

3 participants