Skip to content

LinkInfo의 동등성/해시를 위치 기반으로 변경#332

Merged
reddevilmidzy merged 2 commits intomainfrom
codex/modify-partialeq/hash-for-linkinfo
Mar 12, 2026
Merged

LinkInfo의 동등성/해시를 위치 기반으로 변경#332
reddevilmidzy merged 2 commits intomainfrom
codex/modify-partialeq/hash-for-linkinfo

Conversation

@reddevilmidzy
Copy link
Owner

@reddevilmidzy reddevilmidzy commented Mar 12, 2026

Motivation

  • HashSet<LinkInfo>의 현재 동작은 URL만 기준으로 중복 제거하여 동일 URL이 여러 파일/라인에서 발견되더라도 합쳐지는 문제가 있었습니다.
  • 링크 집계가 “링크 발생 위치 단위”로 동작하도록 LinkInfo의 동등성/해시 기준을 위치까지 포함하도록 변경해야 했습니다.

Description

  • LinkInfoPartialEq 구현을 url + file_path + line_number 비교로 변경했습니다 (src/git/link_extractor.rs).
  • Hash 구현에 file_pathline_number를 추가하여 해시가 위치를 반영하도록 했습니다 (src/git/link_extractor.rs).
  • 테스트들을 위치 단위 의미에 맞게 갱신했으며 test_find_link_in_content_deduplicates_same_line_same_url 테스트를 새로 추가했습니다 (src/git/link_extractor.rs).

Testing

  • Ran cargo test link_extractor::tests::test_link_info_uniqueness -- --nocapture which passed (1 test passed).
  • Ran cargo test link_extractor::tests::test_find_link_in_content_ -- --nocapture which ran test_find_link_in_content_duplicates and test_find_link_in_content_deduplicates_same_line_same_url and both passed.
  • An initial attempted invocation with multiple test names in one command (cargo test test_link_info_uniqueness test_find_link_in_content_duplicates ...) failed due to incorrect cargo test argument usage and did not run tests, but subsequent targeted test runs succeeded.

Codex Task

Summary by CodeRabbit

  • Refactor

    • Link deduplication is now location-aware (URLs at different files/lines are treated as distinct; identical URL+location are deduplicated).
  • Tests

    • Tests made more network-tolerant and use conditional execution; assertions now validate individual link entries rather than relying on strict counts.
  • Bug Fixes

    • Improved handling to ensure duplicates on the same line/URL are reliably deduplicated.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 12, 2026

Walkthrough

LinkInfo equality and hashing were changed to include file_path and line_number (location-aware deduplication). Tests in link_extractor, link_checker, and service were updated to reflect location-aware behavior and to make network-dependent tests tolerant to request failures or skip when remote setup fails.

Changes

Cohort / File(s) Summary
Link Deduplication Logic
src/git/link_extractor.rs
Changed PartialEq and Hash for LinkInfo to include url, file_path, and line_number. Updated tests to expect deduplication by (url, file_path, line_number) and separate entries for same URL at different locations.
Link Checker Tests
src/link_checker/checker.rs
Relaxed test assertions to accept either successful validations/redirects or network-request error outcomes; added helper is_request_error_invalid and pattern-matched results to tolerate network variability.
Service Tests / Repo Init
src/link_checker/service.rs
Made repo initialization in tests use a let-else pattern to early-return when remote access fails; expanded per-item assertions for invalid links instead of single-count checks.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description covers motivation, implementation details, and testing efforts, but doesn't follow the template structure with the specified section headings (♟️ What's this PR about? and 🔗 Related Issues/PRs). Reorganize the description to match the repository template with proper section headings, and include any related issue references (e.g., 'Fixes #X' or 'Closes #X').
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately describes the main change: modifying LinkInfo's equality and hash implementations to be location-aware, which is reflected in all three modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/modify-partialeq/hash-for-linkinfo

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/git/link_extractor.rs (2)

149-213: 새 테스트들이 file_pathline_number를 각각 독립적으로 고정하진 못합니다.

현재 케이스들은 위치가 "둘 다 다름" 또는 "둘 다 같음"만 확인해서, 추후 Eq/Hash에서 file_pathline_number 중 하나가 빠져도 여전히 통과할 수 있습니다. 같은 URL에 대해 (same file, different line)(different file, same line)를 각각 하나씩 추가해 두면 이번 변경 의도를 더 정확히 잠글 수 있습니다.

🧪 Suggested test additions
     // Same URL, same file path, same line number => deduplicated
     let link2_duplicate = LinkInfo {
         url: "https://example.com".to_string(),
         file_path: "file2.txt".to_string(),
         line_number: 2,
     };

     links.insert(link2_duplicate);

     assert_eq!(
         links.len(),
         2,
         "Expected duplicates at identical location to be deduplicated"
     );
+
+    // Same URL, same file path, different line number => distinct
+    let link_same_file_different_line = LinkInfo {
+        url: "https://example.com".to_string(),
+        file_path: "file2.txt".to_string(),
+        line_number: 3,
+    };
+    links.insert(link_same_file_different_line);
+    assert_eq!(links.len(), 3, "Expected different line numbers to remain distinct");
+
+    // Same URL, different file path, same line number => distinct
+    let link_different_file_same_line = LinkInfo {
+        url: "https://example.com".to_string(),
+        file_path: "file3.txt".to_string(),
+        line_number: 3,
+    };
+    links.insert(link_different_file_same_line);
+    assert_eq!(links.len(), 4, "Expected different file paths to remain distinct");
-
-    // Different URL
+    // Different URL
     let link3 = LinkInfo {
         url: "https://example.org".to_string(),
         file_path: "file1.txt".to_string(),
         line_number: 1,
     };

     links.insert(link3);

-    // Should now have three entries because URL is different
-    assert_eq!(links.len(), 3, "Expected three unique link entries");
+    assert_eq!(links.len(), 5, "Expected five unique link entries");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/git/link_extractor.rs` around lines 149 - 213, The new tests don't
independently verify that both file_path and line_number participate in
equality/hash for LinkInfo; update tests to add two explicit cases: one
asserting two LinkInfo entries with the same url and same file_path but
different line_number are treated as distinct, and another asserting two entries
with the same url and same line_number but different file_path are distinct;
modify or extend the existing test_link_info_uniqueness (and/or add small
focused tests near it) and use the LinkInfo constructor and HashSet operations
(and also exercise find_link_in_content where appropriate) so the equality/hash
implementation must include both file_path and line_number to pass.

9-36: LinkInfo의 동등성/해시는 derive로 묶는 편이 더 안전합니다.

지금 구현은 필드 목록을 struct, PartialEq, Hash에 각각 반복해서 적고 있어서, 이후 식별 필드가 바뀌면 한쪽만 수정되는 회귀가 생기기 쉽습니다. 이번 PR처럼 세 필드를 그대로 동일성에 반영할 거라면 derive(PartialEq, Eq, Hash)로 두는 쪽이 더 단순하고 유지보수도 쉽습니다.

♻️ Proposed refactor
-#[derive(Debug, Clone)]
+#[derive(Debug, Clone, PartialEq, Eq, Hash)]
 /// Represents a hyperlink found in a repository, along with its location.
 pub struct LinkInfo {
     /// The URL string. This should be a valid HTTP or HTTPS URL.
     pub url: String,
     /// The relative file path where the URL was found.
     pub file_path: String,
     /// The 1-based line number in the file where the URL was found.
     pub line_number: usize,
 }
-
-impl PartialEq for LinkInfo {
-    fn eq(&self, other: &Self) -> bool {
-        self.url == other.url
-            && self.file_path == other.file_path
-            && self.line_number == other.line_number
-    }
-}
-
-impl Eq for LinkInfo {}
-
-impl std::hash::Hash for LinkInfo {
-    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
-        self.url.hash(state);
-        self.file_path.hash(state);
-        self.line_number.hash(state);
-    }
-}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/git/link_extractor.rs` around lines 9 - 36, The LinkInfo type repeats
field lists across manual implementations of PartialEq and Hash, which is
error-prone; instead add PartialEq, Eq and Hash to the existing derive attribute
on the LinkInfo struct (so the compiler generates equality and hashing) and
remove the manual impl PartialEq, impl Eq and impl std::hash::Hash blocks
(references: the LinkInfo struct and the impl blocks named PartialEq, Eq, and
std::hash::Hash).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/git/link_extractor.rs`:
- Around line 149-213: The new tests don't independently verify that both
file_path and line_number participate in equality/hash for LinkInfo; update
tests to add two explicit cases: one asserting two LinkInfo entries with the
same url and same file_path but different line_number are treated as distinct,
and another asserting two entries with the same url and same line_number but
different file_path are distinct; modify or extend the existing
test_link_info_uniqueness (and/or add small focused tests near it) and use the
LinkInfo constructor and HashSet operations (and also exercise
find_link_in_content where appropriate) so the equality/hash implementation must
include both file_path and line_number to pass.
- Around line 9-36: The LinkInfo type repeats field lists across manual
implementations of PartialEq and Hash, which is error-prone; instead add
PartialEq, Eq and Hash to the existing derive attribute on the LinkInfo struct
(so the compiler generates equality and hashing) and remove the manual impl
PartialEq, impl Eq and impl std::hash::Hash blocks (references: the LinkInfo
struct and the impl blocks named PartialEq, Eq, and std::hash::Hash).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f58fd600-3161-4281-8175-238bc8c6258d

📥 Commits

Reviewing files that changed from the base of the PR and between 4d7b049 and fa654f3.

📒 Files selected for processing (1)
  • src/git/link_extractor.rs

@coveralls-official
Copy link

Pull Request Test Coverage Report for Build 22983460971

Details

  • 68 of 72 (94.44%) changed or added relevant lines in 3 files are covered.
  • 3 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.06%) to 82.98%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/link_checker/service.rs 6 7 85.71%
src/link_checker/checker.rs 40 43 93.02%
Files with Coverage Reduction New Missed Lines %
src/link_checker/service.rs 3 80.39%
Totals Coverage Status
Change from base Build 22542917693: 0.06%
Covered Lines: 1409
Relevant Lines: 1698

💛 - Coveralls

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/link_checker/checker.rs`:
- Around line 212-214: The current is_request_error_invalid only treats Invalid
messages containing "Request error" as transient, but it should also accept
Invalid results that represent retryable HTTP statuses (e.g., 429 and 5xx).
Update the is_request_error_invalid function to detect Invalid(msg) where msg
either contains "Request error" OR matches/contains an "HTTP status code:
<code>" pattern, parse the numeric status code, and return true for 429 or any
5xx code; keep returning false for other status codes. Apply the same logic to
the other helper occurrences noted in the review (the other
is_request_error_invalid uses at the other blocks) so all fallbacks treat
retryable HTTP statuses as transient.

In `@src/link_checker/service.rs`:
- Around line 203-206: Don't silently treat all RepoManager::from(&github_url)
errors as a network skip; instead match the Err and only return (skip) when the
error is a network/remote-access type, while letting other errors fail the test.
Replace the current let Ok(...) else { return } with a match on
RepoManager::from(&github_url): on Ok(repo_manager) continue as before; on
Err(e) inspect e and if it is a network/remote-access variant (e.g.,
RepoManagerError::Network(...) or an underlying io::Error indicating
network/unreachable) return to skip the test, otherwise panic or propagate the
error (e.g., assert!(false, "RepoManager setup failed: {:?}", e)) so local setup
failures (like temp dir creation) still surface as test failures. Use the
RepoManager::from and github_url identifiers to locate and update the code.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 45d22f40-d922-4cf1-9ece-d11f49620b8a

📥 Commits

Reviewing files that changed from the base of the PR and between fa654f3 and 176d14f.

📒 Files selected for processing (2)
  • src/link_checker/checker.rs
  • src/link_checker/service.rs

Comment on lines +212 to +214
fn is_request_error_invalid(result: &LinkCheckResult) -> bool {
matches!(result, LinkCheckResult::Invalid(msg) if msg.contains("Request error"))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Request error만 허용하면 live 테스트가 429/5xx에는 아직 flaky합니다.

check_link는 transport 실패 말고도 외부 서비스의 rate limit/일시 장애를 Invalid("HTTP status code: ...")로 돌려주는데, 지금 fallback은 그 케이스를 전부 실패로 처리합니다. 네트워크 변동성을 흡수하려는 변경이라면 retryable HTTP status도 함께 허용하는 편이 더 안정적입니다.

가능한 수정 방향
-fn is_request_error_invalid(result: &LinkCheckResult) -> bool {
-    matches!(result, LinkCheckResult::Invalid(msg) if msg.contains("Request error"))
+fn is_transient_network_invalid(result: &LinkCheckResult) -> bool {
+    matches!(
+        result,
+        LinkCheckResult::Invalid(msg)
+            if msg.contains("Request error")
+                || msg.starts_with("HTTP status code: 429")
+                || msg.starts_with("HTTP status code: 5")
+    )
 }
-            LinkCheckResult::Invalid(msg) => {
-                assert!(
-                    msg.contains("Request error"),
-                    "Expected network request error, got: {msg}"
-                );
-            }
+            result if is_transient_network_invalid(&result) => {}

다른 테스트들의 helper 사용부도 같은 기준으로 맞추면 됩니다.

Also applies to: 227-230, 242-247, 258-265, 274-281, 290-299

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/link_checker/checker.rs` around lines 212 - 214, The current
is_request_error_invalid only treats Invalid messages containing "Request error"
as transient, but it should also accept Invalid results that represent retryable
HTTP statuses (e.g., 429 and 5xx). Update the is_request_error_invalid function
to detect Invalid(msg) where msg either contains "Request error" OR
matches/contains an "HTTP status code: <code>" pattern, parse the numeric status
code, and return true for 429 or any 5xx code; keep returning false for other
status codes. Apply the same logic to the other helper occurrences noted in the
review (the other is_request_error_invalid uses at the other blocks) so all
fallbacks treat retryable HTTP statuses as transient.

Comment on lines +203 to +206
let Ok(repo_manager) = RepoManager::from(&github_url) else {
// Network-dependent test: skip strict assertions when repository access is unavailable.
return;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

RepoManager::from의 모든 실패를 skip하면 실제 회귀가 가려집니다.

RepoManager::from는 네트워크 외에도 임시 디렉터리 생성 같은 로컬 오류를 반환할 수 있는데, 여기서 전부 return 해버리면 그런 setup 회귀도 테스트 성공으로 보입니다. 원격 접근 불가만 완화하고 나머지는 실패로 남기는 편이 더 안전합니다.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/link_checker/service.rs` around lines 203 - 206, Don't silently treat all
RepoManager::from(&github_url) errors as a network skip; instead match the Err
and only return (skip) when the error is a network/remote-access type, while
letting other errors fail the test. Replace the current let Ok(...) else {
return } with a match on RepoManager::from(&github_url): on Ok(repo_manager)
continue as before; on Err(e) inspect e and if it is a network/remote-access
variant (e.g., RepoManagerError::Network(...) or an underlying io::Error
indicating network/unreachable) return to skip the test, otherwise panic or
propagate the error (e.g., assert!(false, "RepoManager setup failed: {:?}", e))
so local setup failures (like temp dir creation) still surface as test failures.
Use the RepoManager::from and github_url identifiers to locate and update the
code.

@reddevilmidzy reddevilmidzy merged commit 9065b6c into main Mar 12, 2026
10 checks passed
@reddevilmidzy reddevilmidzy deleted the codex/modify-partialeq/hash-for-linkinfo branch March 12, 2026 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant