-
Notifications
You must be signed in to change notification settings - Fork 0
fix: implement longest match logic for ignore patterns #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Update is_ignored method to use longest matching pattern - Add test case to verify longest match behavior - Ensures more specific patterns take precedence over general ones 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
WalkthroughUpdated ignore evaluation in ResourceManager to use longest-match among matching patterns. Added tests and helpers to validate new semantics across multiple patterns and non-matching paths. No public API signature changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant RM as ResourceManager
participant IP as IgnorePatterns
participant R as Resource Path
RM->>IP: get_matching_patterns(R)
Note over IP: Collect all patterns that match R
IP-->>RM: matching_patterns[]
RM->>RM: select longest by "/" depth
alt longest match exists
RM-->>RM: return true (ignored)
else no match
RM-->>RM: return false (not ignored)
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Warning Review ran into problems🔥 ProblemsErrors were encountered while retrieving linked issues. Errors (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
site-builder/src/site/resource.rs (2)
615-619
: Anchor and escape wildcard regex to avoid false positives.Current
replace('*', ".*")
treats regex metacharacters literally and matches substrings. Escape and anchor to match the whole path.- fn is_pattern_match(pattern: &str, resource_path: &str) -> bool { - let path_regex = pattern.replace('*', ".*"); - Regex::new(&path_regex) - .map(|re| re.is_match(resource_path)) - .unwrap_or(false) - } + fn is_pattern_match(pattern: &str, resource_path: &str) -> bool { + // Treat '*' as wildcard; all other metacharacters are literals. + let escaped = regex::escape(pattern).replace("\\*", ".*"); + let path_regex = format!("^{}$", escaped); + Regex::new(&path_regex) + .map(|re| re.is_match(resource_path)) + .unwrap_or(false) + }
621-634
: “Longest match” selection is a no-op; add real specificity + testable helper.As written, this still returns true if any pattern matches. Also, slash-count is a weak specificity metric. Expose a helper with a stronger key and reuse it here.
- fn is_ignored(&self, resource_path: &str) -> bool { - if let Some(ws_resources) = &self.ws_resources { - if let Some(ignore_patterns) = &ws_resources.ignore { - // Find the longest matching pattern - return ignore_patterns - .iter() - .filter(|pattern| Self::is_pattern_match(pattern, resource_path)) - .max_by_key(|pattern| pattern.split('/').count()) - .is_some(); - } - } - false - } + fn is_ignored(&self, resource_path: &str) -> bool { + Self::is_ignored_with(&self.ws_resources, resource_path) + } + + // Prefer patterns with more separators and more non-'*' chars (tie-breaker). + fn is_ignored_with(ws_resources: &Option<WSResources>, resource_path: &str) -> bool { + ws_resources + .as_ref() + .and_then(|cfg| cfg.ignore.as_ref()) + .and_then(|patterns| { + patterns + .iter() + .filter(|p| Self::is_pattern_match(p, resource_path)) + .max_by_key(|p| { + ( + p.chars().filter(|&c| c == '/').count(), + p.chars().filter(|&c| c != '*').count(), + ) + }) + }) + .is_some() + }
🧹 Nitpick comments (4)
site-builder/src/site/resource.rs (4)
447-450
: Align header “longest match” with the improved specificity metric.Use the same specificity key as
is_ignored_with
for consistency.- .filter(|(path, _)| Self::is_pattern_match(path, resource_path)) - .max_by_key(|(path, _)| path.split('/').count()) + .filter(|(path, _)| Self::is_pattern_match(path, resource_path)) + .max_by_key(|(path, _)| { + ( + path.chars().filter(|&c| c == '/').count(), + path.chars().filter(|&c| c != '*').count(), + ) + })
875-877
: Use helper for remaining assertions; add an anchoring guard.Switch to helper and add a case ensuring
/bar/foo/...
does not match/foo/*
.- assert!(!resource_manager.is_ignored("/bar/file.txt")); - assert!(!resource_manager.is_ignored("/index.html")); + assert!(!ResourceManager::is_ignored_with(&ws_resources, "/bar/file.txt")); + assert!(!ResourceManager::is_ignored_with(&ws_resources, "/index.html")); + // Should not match because pattern must match from the start: + assert!(!ResourceManager::is_ignored_with(&ws_resources, "/bar/foo/file.txt"));
340-344
: Minor: log message typo.Comment ends with a stray
/
.- // Skip if resource matches ignore patterns/ + // Skip if resource matches ignore patterns
610-634
: Consider using globset for correctness and performance.Compiling regexes per call is costly and error-prone.
globset
supports*
/**
semantics, precompilation, and fast matching. Can pre-build sets from ws-resources on load.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
site-builder/src/site/resource.rs
(2 hunks)
🔇 Additional comments (1)
site-builder/src/site/resource.rs (1)
879-893
: Mock helper LGTM.The ignore set covers both general and specific patterns; good coverage base.
Summary
Changes
is_ignored
method insite-builder/src/site/resource.rs
to implement longest matching logictest_is_ignored
to verify the longest match behavior/foo/bar/baz/*
) take precedence over more general ones (e.g.,/foo/*
)Test
/foo/bar/baz/file.txt
is correctly ignored by the longest pattern/foo/bar/baz/*
rather than/foo/*
/foo/file.txt
is correctly ignored by/foo/*
pattern/foo/bar/file.txt
is not ignored (no matching pattern)Fixes MystenLabs#285
🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests