Skip to content

Commit 6bb9bb6

Browse files
path-walk: support wildcard pathspecs for blob filtering
Previously, walk_objects_by_path() rejected pathspecs containing wildcards or magic with an error ("provided pathspec is too generic"). This was overly restrictive, as wildcard pathspecs like "d/file.*.txt" are useful for narrowing which blobs to process (e.g., during 'git backfill'). Support wildcard pathspecs by making three changes: 1. Add an 'exact_pathspecs' flag to path_walk_context. When the pathspec has no wildcards or magic, set this flag and use the existing fast-path prefix matching in add_tree_entries(). When wildcards are present, skip that block since prefix matching cannot handle glob patterns. 2. Disable revision-level commit pruning (revs->prune = 0) for wildcard pathspecs. The revision walk uses the pathspec to filter commits via TREESAME detection. For exact prefix pathspecs this works well, but wildcard pathspecs may fail to match through TREESAME because fnmatch with WM_PATHNAME does not cross directory boundaries. Disabling pruning ensures all commits are visited and their trees are available for the path-walk to filter. 3. Add a match_pathspec() check in walk_path() to filter out blobs whose full path does not match the pathspec. This provides the actual blob-level filtering for wildcard pathspecs. Signed-off-by: Derrick Stolee <stolee@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 2ef22aa commit 6bb9bb6

File tree

2 files changed

+31
-7
lines changed

2 files changed

+31
-7
lines changed

path-walk.c

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,8 @@ struct path_walk_context {
6262
*/
6363
struct prio_queue path_stack;
6464
struct strset path_stack_pushed;
65+
66+
unsigned exact_pathspecs:1;
6567
};
6668

6769
static int compare_by_type(const void *one, const void *two, void *cb_data)
@@ -206,7 +208,7 @@ static int add_tree_entries(struct path_walk_context *ctx,
206208
match != MATCHED)
207209
continue;
208210
}
209-
if (ctx->revs->prune_data.nr) {
211+
if (ctx->revs->prune_data.nr && ctx->exact_pathspecs) {
210212
struct pathspec *pd = &ctx->revs->prune_data;
211213
bool found = false;
212214

@@ -317,6 +319,13 @@ static int walk_path(struct path_walk_context *ctx,
317319
return 0;
318320
}
319321

322+
if (list->type == OBJ_BLOB &&
323+
ctx->revs->prune_data.nr &&
324+
!match_pathspec(ctx->repo->index, &ctx->revs->prune_data,
325+
path, strlen(path), 0,
326+
NULL, 0))
327+
return 0;
328+
320329
/* Evaluate function pointer on this data, if requested. */
321330
if ((list->type == OBJ_TREE && ctx->info->trees) ||
322331
(list->type == OBJ_BLOB && ctx->info->blobs) ||
@@ -525,14 +534,12 @@ int walk_objects_by_path(struct path_walk_info *info)
525534
info->revs->tag_objects = 1;
526535

527536
if (ctx.revs->prune_data.nr) {
528-
/*
529-
* Check that all pathspecs are prefixes, or remove
530-
* them from consideration, with a warning.
531-
*/
532537
struct pathspec *pd = &ctx.revs->prune_data;
533538

534-
if (pd->has_wildcard || pd->magic)
535-
return error(_("provided pathspec is too generic"));
539+
if (!pd->has_wildcard && !pd->magic)
540+
ctx.exact_pathspecs = 1;
541+
else
542+
ctx.revs->prune = 0;
536543
}
537544

538545
/* Insert a single list for the root tree into the paths. */

t/t5620-backfill.sh

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -297,6 +297,23 @@ test_expect_success 'backfill with multiple pathspecs' '
297297
test_line_count = 16 missing
298298
'
299299

300+
test_expect_success 'backfill with wildcard pathspec' '
301+
test_when_finished rm -rf backfill-path &&
302+
git clone --bare --filter=blob:none \
303+
--single-branch --branch=main \
304+
"file://$(pwd)/srv.bare" backfill-path &&
305+
306+
# No blobs yet
307+
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
308+
test_line_count = 48 missing &&
309+
310+
git -C backfill-path backfill HEAD -- "d/file.*.txt" 2>err &&
311+
test_must_be_empty err &&
312+
313+
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
314+
test_line_count = 40 missing
315+
'
316+
300317
test_expect_success 'backfill with --all' '
301318
test_when_finished rm -rf backfill-all &&
302319
git clone --no-checkout --filter=blob:none \

0 commit comments

Comments
 (0)