Detect padding in doc comments #204

pjsier · 2021-10-10T18:46:32Z

What does this PR accomplish?

🦚 Feature

Closes #133.

Changes proposed by this PR:

Detect padding in doc comments with CommentPaddingStyle enum

Notes to reviewer:

Adds enum and detect_padding method to parse and store padding style

📜 Checklist

Works on the ./demo sub directory
Test coverage is excellent and passes
Documentation is thorough

src/documentation/literal.rs

drahnr · 2021-10-11T12:39:49Z

src/documentation/literal.rs

@@ -111,13 +111,26 @@ impl CommentVariant {
    }
 }

+#[derive(Clone, Hash, PartialEq)]
+pub enum Padding {
+    Padding(String),


If we only have one content variant here, we might want to limit it to AsteriskSpace{ leading_spaces: usize }

That makes sense, thanks! To make sure I'm understanding, for the string " * ", it would be AsteriskSpace with 1 for leading_spaces or 3 to account for the whole string?

I was thinking of the two variants: Doc and NonDoc, the first might be a single span for now and hence contain all leading spaces as well.

Would this be a separate enum or replace the current one? I updated the current enum to reflect your comments and the more specific variant

That was misleading, sorry about that. I think we can represent both with the enum Padding with the Asterisk* variant.

pjsier · 2021-10-11T15:43:31Z

Thanks for the feedback on this! I updated the branch with your comments, should this be incorporated into the trim_span method or should it only detect the padding style for now and leave the padding inside the TrimmedLiteral?

drahnr · 2021-10-11T16:11:09Z

Thanks for the feedback on this! I updated the branch with your comments, should this be incorporated into the trim_span method or should it only detect the padding style for now and leave the padding inside the TrimmedLiteral?

I think, removing it at the span/syn/ra_ap_syntax stage will simplify later stages significantly (or: avoid additional code changes), so I'd recommend having a clean TrimmedLiteral that has a representation without the leading \s*\*\s.

Note: There is a caveat that a few consecutive \s\*\s could also be a markdown lists if they are not continuous for each line :)

pjsier · 2021-10-11T16:48:06Z

Great, thanks! Should this be in or before the trim_span method then? I know you had mentioned moving the padding detection after, but it seems like we would need ti there.

Also, I think we'll need to bump the version of fancy-regex to take advantage of the replace_all method for parsing this if that's alright

drahnr · 2021-10-11T16:55:34Z

Also, I think we'll need to bump the version of fancy-regex to take advantage of the replace_all method for parsing this if that's alright

Whereever you see fit :)

Bump deps as needed.

pjsier · 2021-10-13T22:03:09Z

I made some updates to reflect the recent feedback but it looks like I'm a bit stuck again. rendered currently includes some content that is ignored like the prefix and suffix, so I initially tried to maintain this and only store metadata about the padding without modifying the content of rendered directly. I ran into issues because the replacement logic for as_str became more complicated than the offsets from prefix and postfix currently used, and it seemed like it would require returning a String instead of &str

Should rendered include the padding strings, and if so should a separate field be added to TrimmedLiteral that stores the content with padding strings removed? Thanks again!

drahnr · 2021-10-14T06:42:46Z

src/documentation/literal.rs

@@ -656,7 +730,7 @@ mod tests {
    block_comment_test!(
        trimmed_multi_doc,
        "/**
-mood
+ * mood


I think both variants should work, the * should be optional.

drahnr · 2021-10-14T06:44:42Z

src/documentation/literal.rs

+        let content = "/**\n doc\n doc\n */".to_string();
+        assert_matches!(
+            detect_and_remove_padding(&content),
+            (CommentPaddingStyle::NoPadding, content)
+        );


I'd prefer smaller test cases, even if this has a compile time cost to it.

And I think we need a few more to cover the indented variants, as in multiple leading spaces before the asterisk.

drahnr · 2021-10-14T06:45:23Z

src/documentation/literal.rs

+        assert_matches!(
+            detect_and_remove_padding(&content),
+            (CommentPaddingStyle::NoPadding, content)


👍 nice use of assert_matches :)

drahnr · 2021-10-14T06:47:30Z

src/documentation/literal.rs

+    lazy_static! {
+        static ref PADDING_STR: Regex =
+            Regex::new(r##"(?m)^\s\*\s"##).expect("PADDING_STR regex compiles");
+    };


#202 introduced an optimization to avoid the regex entirely and use smid instructions as available, probably worth to hand roll this, but that can be done once you're confident of the structure - this is works and is good for the time being 👍

drahnr · 2021-10-18T07:12:42Z

I made some updates to reflect the recent feedback but it looks like I'm a bit stuck again. rendered currently includes some content that is ignored like the prefix and suffix, so I initially tried to maintain this and only store metadata about the padding without modifying the content of rendered directly. I ran into issues because the replacement logic for as_str became more complicated than the offsets from prefix and postfix currently used, and it seemed like it would require returning a String instead of &str

I think you have two options: Either replace as_str() -> &'_ str into a as_str_set() -> &[&'_ str] as well as impl ToString and deal with the fallout over the code base. Note that this will not avoid the allocation, since the set creation will itself require an allocation.
While the above is possible, I think it's much easier to split a TrimmedLiteral into multiple and adjust the Cluster (if needed, I did not dig into the details). This would essentially avoid the whole ordeal of handling paddings in the fn render(), but re-use existing logic of TrimmedLiteral clustering, where padding could be just a another delimiter as initially discussed.
It's super important though to get the offsets right (with emojis and multi-width characters) and have sufficient test cases, any issue here will cause wrong spans being passed throughout the pipeline and that's not-fun™ to debug.

Should rendered include the padding strings, and if so should a separate field be added to TrimmedLiteral that stores the content with padding strings removed? Thanks again!

rendered should definitely not contain those. It was always meant to be the content representation that is presented to the next stage in the documentation generation pipeline (here: parsing with markdown/cmark) and hence must be cleared from any control/style chars that are only relevant to the rust documentation annotation.

Just explaining this to you makes it clear that there is a lot of documentation missing and quite a few design joices are not obvious, and naming of things could be improved by quite a bit. Very much appreciate your effort!

@pjsier let me know if there is anything else I can do to aid you driving this to completion :)

drahnr · 2021-10-27T12:15:38Z

@pjsier gentle ping :)

pjsier · 2021-10-27T21:19:52Z

@drahnr thanks for following up on this! I haven't had as much time as I would have liked, so if I'm holding anything up feel free to take over and thanks for all the input

drahnr · 2021-10-28T08:24:40Z

@drahnr thanks for following up on this! I haven't had as much time as I would have liked, so if I'm holding anything up feel free to take over and thanks for all the input

I think it's better if you push it over the finish line :) there is no particular rush here - take your time!

drahnr reviewed Oct 11, 2021

View reviewed changes

src/documentation/literal.rs Outdated Show resolved Hide resolved

drahnr reviewed Oct 11, 2021

View reviewed changes

src/documentation/literal.rs Outdated Show resolved Hide resolved

drahnr reviewed Oct 11, 2021

View reviewed changes

drahnr added the enhancement 🦚 New feature or request label Oct 13, 2021

drahnr added this to the v0.9.0 milestone Oct 13, 2021

drahnr reviewed Oct 14, 2021

View reviewed changes

pjsier added 3 commits October 16, 2021 08:43

feat: WIP detect padding in doc comments

4be5c7e

refactor: updates from review comments

c59e6b4

wip updates

03b0d5a

pjsier force-pushed the feat/doc-comments-133 branch from 268ad24 to 03b0d5a Compare October 16, 2021 13:54

drahnr self-assigned this Oct 20, 2021

drahnr removed this from the v0.9.0 milestone Oct 28, 2021

drahnr added stale 😐 PRs and issues that need some additional work to make it across the finishline good first issue 🔰 Good for newcomers help wanted 🤝 Extra attention is needed internal ⚙️ Internal issues or TODOs labels Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect padding in doc comments #204

Detect padding in doc comments #204

pjsier commented Oct 10, 2021 •

edited

Loading

drahnr Oct 11, 2021

pjsier Oct 11, 2021

drahnr Oct 11, 2021

pjsier Oct 11, 2021

drahnr Oct 11, 2021

pjsier commented Oct 11, 2021

drahnr commented Oct 11, 2021

pjsier commented Oct 11, 2021

drahnr commented Oct 11, 2021 •

edited

Loading

pjsier commented Oct 13, 2021

drahnr Oct 14, 2021

drahnr Oct 14, 2021

drahnr Oct 14, 2021

drahnr Oct 14, 2021

drahnr commented Oct 18, 2021 •

edited

Loading

drahnr commented Oct 27, 2021

pjsier commented Oct 27, 2021

drahnr commented Oct 28, 2021 •

edited

Loading

Detect padding in doc comments #204

Are you sure you want to change the base?

Detect padding in doc comments #204

Conversation

pjsier commented Oct 10, 2021 • edited Loading

What does this PR accomplish?

Changes proposed by this PR:

Notes to reviewer:

📜 Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pjsier commented Oct 11, 2021

drahnr commented Oct 11, 2021

pjsier commented Oct 11, 2021

drahnr commented Oct 11, 2021 • edited Loading

pjsier commented Oct 13, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drahnr commented Oct 18, 2021 • edited Loading

drahnr commented Oct 27, 2021

pjsier commented Oct 27, 2021

drahnr commented Oct 28, 2021 • edited Loading

pjsier commented Oct 10, 2021 •

edited

Loading

drahnr commented Oct 11, 2021 •

edited

Loading

drahnr commented Oct 18, 2021 •

edited

Loading

drahnr commented Oct 28, 2021 •

edited

Loading