Add lint for broken doc links #13696

maxclaus · 2024-11-16T16:23:31Z

changelog: [doc_broken_link]: Add pedantic lint to catch broken doc links that won't produce a link tag by rustdoc.

rustbot · 2024-11-16T16:23:36Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Jarcho (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

@rustbot author: the review is finished, PR author should check the comments and take action accordingly
@rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

maxclaus · 2024-11-16T16:58:14Z

Just noticed there are tests failing due to a false positive like this:

/// Referencing an slice [T]

This will be considered a broken link although actually it isn't. I guess in order to fix it I won't be able to check fake value from pulldown_cmark::Parser::new_with_broken_link_callback anymore, since it doesn't provide the raw text to check why it was considered a broken link. I will try to work on a different solution, appreciate if there are any suggestions to achieve it.

maxclaus · 2024-11-16T21:51:28Z

I will try to use similar approach used on https://github.com/rust-lang/rust-clippy/blob/master/clippy_lints/src/tabs_in_doc_comments.rs

bors · 2024-11-21T22:12:56Z

☔ The latest upstream changes (presumably 8298da7) made this pull request unmergeable. Please resolve the merge conflicts.

Jarcho

Sorry for the long wait. Left a couple of specific comments.

One thing you'll need to do is take the text input from the markdown parser. Currently you'll be linting inside code and html sections. Doc attribute can also contain multiple lines.

clippy_lints/src/doc/broken_link.rs

maxclaus · 2024-12-07T21:48:12Z

@Jarcho thanks for taking the time to review this.

I applied two of your code improvement suggestions. I am a bit confused though about these others comments:

One thing you'll need to do is take the text input from the markdown parser.

Do you mean this lint should run on this parser's output instead of running before that step, which is done against the rust AST attributes? I tried doing that as the first approach, but the new_with_broken_link_callback we use sanitizes broken links replacing them with a fake text and link values, and that makes impossible to run any of this lint logic.

Currently you'll be linting inside code and html sections.

It is being applied only for AttrKind::DocComment attributes. Doesn't it guarantees only code doc comments are covered by this lint, which I imagine is what we want?

Doc attribute can also contain multiple lines.

This current logic is checking for broken links across multiple lines, so I am confused what you mean on this one, as it is already checking for multiple lines.

maxclaus · 2024-12-07T21:48:55Z

@rustbot review

Jarcho · 2024-12-08T00:44:46Z

It is being applied only for AttrKind::DocComment attributes. Doesn't it guarantees only code doc comments are covered by this lint, which I imagine is what we want?

Yes. And markdown contain code and html sections. Neither of which will try to parse links.

This current logic is checking for broken links across multiple lines, so I am confused what you mean on this one, as it is already checking for multiple lines.

Right now you're assuming that each attribute contains a single line. This is normally true, but isn't guaranteed.

maxclaus · 2024-12-08T13:19:57Z

@Jarcho So, to make sure I got it right:

Yes. And markdown contain code and html sections. Neither of which will try to parse links.

Are you saying AttrKind::DocComment attributes are markdown, which include code and html sections, and we should not try to parse links for those types, applying it on real document's content only? If that is what you mean, is your suggestion to follow the same approach from doc/mod.rs which uses the pulldown_cmark to parse the doc comments? But in this case we would not use new_with_broken_link_callback in order to properly handle those broken links, since as mentioned before, they get replaced with fake values in that case.

About this other one:

Right now you're assuming that each attribute contains a single line. This is normally true, but isn't guaranteed.

Would attributes with multiple lines be represented with \n so I should handle that case or are multiple lines represented in a different format? Also, is there are way to reproduce those multiple lines without actually adding \n to comments so I can properly have tests for that case?

Jarcho · 2024-12-11T02:13:37Z

Are you saying AttrKind::DocComment attributes are markdown, which include code and html sections, and we should not try to parse links for those types, applying it on real document's content only? If that is what you mean, is your suggestion to follow the same approach from doc/mod.rs which uses the pulldown_cmark to parse the doc comments? But in this case we would not use new_with_broken_link_callback in order to properly handle those broken links, since as mentioned before, they get replaced with fake values in that case.

I don't see why that would stop this from working. You can use the span given to the callback to know where to start parsing the input string for the link's destination.

Would attributes with multiple lines be represented with \n so I should handle that case or are multiple lines represented in a different format? Also, is there are way to reproduce those multiple lines without actually adding \n to comments so I can properly have tests for that case?

rustdoc joins all the doc attributes together with \n as the separator and then passes that to the markdown parser.

maxclaus · 2024-12-18T21:34:04Z

@Jarcho

I don't see why that would stop this from working. You can use the span given to the callback to know where to start parsing the input string for the link's destination.

Ok. I will give the markdown parser another try.

rustdoc joins all the doc attributes together with \n as the separator and then passes that to the markdown parser.

rustdoc might do that, but that is not what I have seen on clippy. Each line is a different AttrKind::DocComment, the linter does not provide a single AttrKind::DocComment merged with \n for all sibling lines of AttrKind::DocComment.

maxclaus · 2025-01-02T15:42:50Z

hey @Jarcho, I just pushed some temporary changes to confirm I am on the right direction.

I added a new function to the pulldown_cmark fake_broken_link_callback handler based on your suggestions. This is the data we have available within that function:

doc="Test invalid link, url part broken across multiple lines.\n[doc invalid link broken url scheme part part](https://\ntest.fake/doc_invalid_link_broken_url_scheme_part)"

 fragments=[
    DocFragment {
        span: tests/ui/doc_broken_link.rs:40:1: 40:62 (#0),
        item_id: None,
        doc: " Test invalid link, url part broken across multiple lines.",
        kind: SugaredDoc,
        indent: 1,
    },
    DocFragment {
        span: tests/ui/doc_broken_link.rs:41:1: 41:60 (#0),
        item_id: None,
        doc: " [doc invalid link broken url scheme part part](https://",
        kind: SugaredDoc,
        indent: 1,
    },
    DocFragment {
        span: tests/ui/doc_broken_link.rs:42:1: 42:55 (#0),
        item_id: None,
        doc: " test.fake/doc_invalid_link_broken_url_scheme_part)",
        kind: SugaredDoc,
        indent: 1,
    },
]

 bl=BrokenLink {
    span: 58..104,
    link_type: Shortcut,
    reference: Borrowed(
        "doc invalid link broken url scheme part part",
    ),
}

 text based on 'bl.span' range="[doc invalid link broken url scheme part part]"

So, as we can see the BrokenLink data we get from fake_broken_link_callback gives a span just for the link title. I could use that span initial position to start reading the link and get the url part too. Is that what you had in mind?

@rustbot review

maxclaus · 2025-02-04T17:32:36Z

hi @Jarcho, I have not heard back from you in a month, hopefully everything is alright with you and you are only taking a time off. If you are not back yet, is there someone else that could assist me with this PR?

Jarcho · 2025-02-08T23:03:17Z

Sorry for the wait. Reading from the link title span is what I had in mind.

rustdoc might do that, but that is not what I have seen on clippy. Each line is a different AttrKind::DocComment, the linter does not provide a single AttrKind::DocComment merged with \n for all sibling lines of AttrKind::DocComment.

Clippy is combining attrs_to_doc_fragments with add_doc_fragment from rustdoc. Between the two of them it adds the necessary line breaks and indentation edits to get the final markdown document. My main point here is doc comments are sequences of doc attributes, each doc attribute is just a string, and each of those strings can themselves contain line breaks. Each line is usually a separate attribute, but that isn't guaranteed to be the case.

With the implementation running through the broken links handler this will be handled properly.

maxclaus · 2025-02-11T22:53:02Z

@rustbot review

maxclaus · 2025-02-12T11:00:03Z

clippy_lints/src/doc/mod.rs

-    fn fake_broken_link_callback<'a>(_: BrokenLink<'_>) -> Option<(CowStr<'a>, CowStr<'a>)> {
-        Some(("fake".into(), "fake".into()))
-    }
-


I had to move this function declaration down so it has access to doc and fragments variables.

maxclaus · 2025-03-10T11:51:54Z

hey @Jarcho could you review this please?

Jarcho

Again, sorry for the long delay. Behaviour wise this looks to be fine. Can you rebase this so we can get a lintcheck run on it please?

clippy_lints/src/doc/broken_link.rs

maxclaus · 2025-04-02T23:31:01Z

@rustbot review

Jarcho

Looks like everything is good. Thank you.

Can you just squash the commits down please.

Jarcho · 2025-05-21T20:44:32Z

Ping @maxclaus from triage. This is only waiting on the commits being squashed.

Fix false positives on broken link detection Refactor variable names Fix doc comment about broken link lint Refactor, remove not used variable Improve broken link to catch more cases and span point to whole link Include reason why a link is considered broken Drop some checker because rustdoc already warn about them Refactor to use a single enum instead of multiple bool variables Fix lint warnings Rename function to collect broken links Warn directly instead of collecting all entries first Iterate directly rather than collecting Temporary change to confirm with code reviewer the next steps Handle broken links as part of the fake_broken_link_callback handler Simplify broken link detection without state machine usage Fix typos Add url check to reduce false positives Drop reason enum as there is only one reason Fix duplicated diagnostics Fix linter

maxclaus · 2025-06-05T19:00:02Z

@Jarcho done

rustbot assigned Jarcho Nov 16, 2024

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Nov 16, 2024

maxclaus force-pushed the lint-doc-broken-links branch from 8d859c9 to 6ff53c7 Compare November 16, 2024 16:34

maxclaus force-pushed the lint-doc-broken-links branch 2 times, most recently from 640f282 to 8deb383 Compare November 18, 2024 00:27

Jarcho requested changes Dec 4, 2024

View reviewed changes

clippy_lints/src/doc/broken_link.rs Outdated Show resolved Hide resolved

clippy_lints/src/doc/broken_link.rs Outdated Show resolved Hide resolved

Jarcho added S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties labels Dec 6, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties and removed S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) labels Dec 7, 2024

maxclaus commented Feb 12, 2025

View reviewed changes

Jarcho reviewed Mar 31, 2025

View reviewed changes

clippy_lints/src/doc/broken_link.rs Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

rustbot added S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties labels Mar 31, 2025

maxclaus force-pushed the lint-doc-broken-links branch from 776fe26 to f577d71 Compare April 2, 2025 22:53

maxclaus commented Apr 2, 2025

View reviewed changes

clippy_lints/src/doc/broken_link.rs Show resolved Hide resolved

maxclaus force-pushed the lint-doc-broken-links branch from b5effa4 to bd214be Compare April 2, 2025 23:20

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties and removed S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) labels Apr 2, 2025

Jarcho approved these changes May 6, 2025

View reviewed changes

Jarcho added this pull request to the merge queue May 6, 2025

Jarcho removed this pull request from the merge queue due to a manual request May 6, 2025

maxclaus force-pushed the lint-doc-broken-links branch from bd214be to 8964f6e Compare June 5, 2025 18:59

Jarcho added this pull request to the merge queue Jun 16, 2025

Merged via the queue into rust-lang:master with commit af9d568 Jun 16, 2025
11 checks passed

Add lint for broken doc links #13696

Add lint for broken doc links #13696

Uh oh!

Conversation

maxclaus commented Nov 16, 2024

Uh oh!

rustbot commented Nov 16, 2024

Uh oh!

maxclaus commented Nov 16, 2024

Uh oh!

maxclaus commented Nov 16, 2024

Uh oh!

bors commented Nov 21, 2024

Uh oh!

Jarcho left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

maxclaus commented Dec 7, 2024

Uh oh!

maxclaus commented Dec 7, 2024

Uh oh!

Jarcho commented Dec 8, 2024

Uh oh!

maxclaus commented Dec 8, 2024

Uh oh!

Jarcho commented Dec 11, 2024

Uh oh!

maxclaus commented Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxclaus commented Jan 2, 2025

Uh oh!

maxclaus commented Feb 4, 2025

Uh oh!

Jarcho commented Feb 8, 2025

Uh oh!

maxclaus commented Feb 11, 2025

Uh oh!

maxclaus Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

maxclaus commented Mar 10, 2025

Uh oh!

Jarcho left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment has been minimized.

Uh oh!

maxclaus commented Apr 2, 2025

Uh oh!

Jarcho left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jarcho commented May 21, 2025

Uh oh!

maxclaus commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

maxclaus commented Dec 18, 2024 •

edited

Loading

Jarcho left a comment •

edited

Loading