Skip to content

Add lint for broken doc links #13696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 16, 2025
Merged

Conversation

maxclaus
Copy link

fixes #2179

changelog: [doc_broken_link]: Add pedantic lint to catch broken doc links that won't produce a link tag by rustdoc.

@rustbot
Copy link
Collaborator

rustbot commented Nov 16, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Jarcho (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Nov 16, 2024
@maxclaus maxclaus force-pushed the lint-doc-broken-links branch from 8d859c9 to 6ff53c7 Compare November 16, 2024 16:34
@maxclaus
Copy link
Author

Just noticed there are tests failing due to a false positive like this:

/// Referencing an slice [T]

This will be considered a broken link although actually it isn't. I guess in order to fix it I won't be able to check fake value from pulldown_cmark::Parser::new_with_broken_link_callback anymore, since it doesn't provide the raw text to check why it was considered a broken link. I will try to work on a different solution, appreciate if there are any suggestions to achieve it.

@maxclaus
Copy link
Author

@maxclaus maxclaus force-pushed the lint-doc-broken-links branch 2 times, most recently from 640f282 to 8deb383 Compare November 18, 2024 00:27
@bors
Copy link
Contributor

bors commented Nov 21, 2024

☔ The latest upstream changes (presumably 8298da7) made this pull request unmergeable. Please resolve the merge conflicts.

Copy link
Contributor

@Jarcho Jarcho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long wait. Left a couple of specific comments.

One thing you'll need to do is take the text input from the markdown parser. Currently you'll be linting inside code and html sections. Doc attribute can also contain multiple lines.

@Jarcho Jarcho added S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties labels Dec 6, 2024
@maxclaus
Copy link
Author

maxclaus commented Dec 7, 2024

@Jarcho thanks for taking the time to review this.

I applied two of your code improvement suggestions. I am a bit confused though about these others comments:

One thing you'll need to do is take the text input from the markdown parser.

Do you mean this lint should run on this parser's output instead of running before that step, which is done against the rust AST attributes? I tried doing that as the first approach, but the new_with_broken_link_callback we use sanitizes broken links replacing them with a fake text and link values, and that makes impossible to run any of this lint logic.

Currently you'll be linting inside code and html sections.

It is being applied only for AttrKind::DocComment attributes. Doesn't it guarantees only code doc comments are covered by this lint, which I imagine is what we want?

Doc attribute can also contain multiple lines.

This current logic is checking for broken links across multiple lines, so I am confused what you mean on this one, as it is already checking for multiple lines.

@maxclaus
Copy link
Author

maxclaus commented Dec 7, 2024

@rustbot review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties and removed S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) labels Dec 7, 2024
@Jarcho
Copy link
Contributor

Jarcho commented Dec 8, 2024

It is being applied only for AttrKind::DocComment attributes. Doesn't it guarantees only code doc comments are covered by this lint, which I imagine is what we want?

Yes. And markdown contain code and html sections. Neither of which will try to parse links.

This current logic is checking for broken links across multiple lines, so I am confused what you mean on this one, as it is already checking for multiple lines.

Right now you're assuming that each attribute contains a single line. This is normally true, but isn't guaranteed.

@maxclaus
Copy link
Author

maxclaus commented Dec 8, 2024

@Jarcho So, to make sure I got it right:

Yes. And markdown contain code and html sections. Neither of which will try to parse links.

Are you saying AttrKind::DocComment attributes are markdown, which include code and html sections, and we should not try to parse links for those types, applying it on real document's content only? If that is what you mean, is your suggestion to follow the same approach from doc/mod.rs which uses the pulldown_cmark to parse the doc comments? But in this case we would not use new_with_broken_link_callback in order to properly handle those broken links, since as mentioned before, they get replaced with fake values in that case.

About this other one:

Right now you're assuming that each attribute contains a single line. This is normally true, but isn't guaranteed.

Would attributes with multiple lines be represented with \n so I should handle that case or are multiple lines represented in a different format? Also, is there are way to reproduce those multiple lines without actually adding \n to comments so I can properly have tests for that case?

@Jarcho
Copy link
Contributor

Jarcho commented Dec 11, 2024

Are you saying AttrKind::DocComment attributes are markdown, which include code and html sections, and we should not try to parse links for those types, applying it on real document's content only? If that is what you mean, is your suggestion to follow the same approach from doc/mod.rs which uses the pulldown_cmark to parse the doc comments? But in this case we would not use new_with_broken_link_callback in order to properly handle those broken links, since as mentioned before, they get replaced with fake values in that case.

I don't see why that would stop this from working. You can use the span given to the callback to know where to start parsing the input string for the link's destination.

Would attributes with multiple lines be represented with \n so I should handle that case or are multiple lines represented in a different format? Also, is there are way to reproduce those multiple lines without actually adding \n to comments so I can properly have tests for that case?

rustdoc joins all the doc attributes together with \n as the separator and then passes that to the markdown parser.

@maxclaus
Copy link
Author

maxclaus commented Dec 18, 2024

@Jarcho

I don't see why that would stop this from working. You can use the span given to the callback to know where to start parsing the input string for the link's destination.

Ok. I will give the markdown parser another try.

rustdoc joins all the doc attributes together with \n as the separator and then passes that to the markdown parser.

rustdoc might do that, but that is not what I have seen on clippy. Each line is a different AttrKind::DocComment, the linter does not provide a single AttrKind::DocComment merged with \n for all sibling lines of AttrKind::DocComment.

@maxclaus
Copy link
Author

maxclaus commented Jan 2, 2025

hey @Jarcho, I just pushed some temporary changes to confirm I am on the right direction.

I added a new function to the pulldown_cmark fake_broken_link_callback handler based on your suggestions. This is the data we have available within that function:

doc="Test invalid link, url part broken across multiple lines.\n[doc invalid link broken url scheme part part](https://\ntest.fake/doc_invalid_link_broken_url_scheme_part)"

 fragments=[
    DocFragment {
        span: tests/ui/doc_broken_link.rs:40:1: 40:62 (#0),
        item_id: None,
        doc: " Test invalid link, url part broken across multiple lines.",
        kind: SugaredDoc,
        indent: 1,
    },
    DocFragment {
        span: tests/ui/doc_broken_link.rs:41:1: 41:60 (#0),
        item_id: None,
        doc: " [doc invalid link broken url scheme part part](https://",
        kind: SugaredDoc,
        indent: 1,
    },
    DocFragment {
        span: tests/ui/doc_broken_link.rs:42:1: 42:55 (#0),
        item_id: None,
        doc: " test.fake/doc_invalid_link_broken_url_scheme_part)",
        kind: SugaredDoc,
        indent: 1,
    },
]

 bl=BrokenLink {
    span: 58..104,
    link_type: Shortcut,
    reference: Borrowed(
        "doc invalid link broken url scheme part part",
    ),
}

 text based on 'bl.span' range="[doc invalid link broken url scheme part part]"

So, as we can see the BrokenLink data we get from fake_broken_link_callback gives a span just for the link title. I could use that span initial position to start reading the link and get the url part too. Is that what you had in mind?

@rustbot review

@maxclaus
Copy link
Author

maxclaus commented Feb 4, 2025

hi @Jarcho, I have not heard back from you in a month, hopefully everything is alright with you and you are only taking a time off. If you are not back yet, is there someone else that could assist me with this PR?

@Jarcho
Copy link
Contributor

Jarcho commented Feb 8, 2025

Sorry for the wait. Reading from the link title span is what I had in mind.

rustdoc might do that, but that is not what I have seen on clippy. Each line is a different AttrKind::DocComment, the linter does not provide a single AttrKind::DocComment merged with \n for all sibling lines of AttrKind::DocComment.

Clippy is combining attrs_to_doc_fragments with add_doc_fragment from rustdoc. Between the two of them it adds the necessary line breaks and indentation edits to get the final markdown document. My main point here is doc comments are sequences of doc attributes, each doc attribute is just a string, and each of those strings can themselves contain line breaks. Each line is usually a separate attribute, but that isn't guaranteed to be the case.

With the implementation running through the broken links handler this will be handled properly.

@maxclaus
Copy link
Author

@rustbot review

fn fake_broken_link_callback<'a>(_: BrokenLink<'_>) -> Option<(CowStr<'a>, CowStr<'a>)> {
Some(("fake".into(), "fake".into()))
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to move this function declaration down so it has access to doc and fragments variables.

@maxclaus
Copy link
Author

hey @Jarcho could you review this please?

Copy link
Contributor

@Jarcho Jarcho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, sorry for the long delay. Behaviour wise this looks to be fine. Can you rebase this so we can get a lintcheck run on it please?

@rustbot

This comment has been minimized.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties labels Mar 31, 2025
@maxclaus maxclaus force-pushed the lint-doc-broken-links branch from 776fe26 to f577d71 Compare April 2, 2025 22:53
@maxclaus maxclaus force-pushed the lint-doc-broken-links branch from b5effa4 to bd214be Compare April 2, 2025 23:20
@maxclaus
Copy link
Author

maxclaus commented Apr 2, 2025

@rustbot review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties and removed S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) labels Apr 2, 2025
Copy link
Contributor

@Jarcho Jarcho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like everything is good. Thank you.

Can you just squash the commits down please.

@Jarcho Jarcho added this pull request to the merge queue May 6, 2025
@Jarcho Jarcho removed this pull request from the merge queue due to a manual request May 6, 2025
@Jarcho
Copy link
Contributor

Jarcho commented May 21, 2025

Ping @maxclaus from triage. This is only waiting on the commits being squashed.

Fix false positives on broken link detection

Refactor variable names

Fix doc comment about broken link lint

Refactor, remove not used variable

Improve broken link to catch more cases and span point to whole link

Include reason why a link is considered broken

Drop some checker because rustdoc already warn about them

Refactor to use a single enum instead of multiple bool variables

Fix lint warnings

Rename function to collect broken links

Warn directly instead of collecting all entries first

Iterate directly rather than collecting

Temporary change to confirm with code reviewer the next steps

Handle broken links as part of the fake_broken_link_callback handler

Simplify broken link detection without state machine usage

Fix typos

Add url check to reduce false positives

Drop reason enum as there is only one reason

Fix duplicated diagnostics

Fix linter
@maxclaus maxclaus force-pushed the lint-doc-broken-links branch from bd214be to 8964f6e Compare June 5, 2025 18:59
@maxclaus
Copy link
Author

maxclaus commented Jun 5, 2025

@Jarcho done

@Jarcho Jarcho added this pull request to the merge queue Jun 16, 2025
Merged via the queue into rust-lang:master with commit af9d568 Jun 16, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lint broken URLs in documentation comments
5 participants