Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle bold/strong nested inside italics/em #307

Conversation

cddude229
Copy link

@cddude229 cddude229 commented May 2, 2024

While trying out this library, my initial input ran up against some edge cases. (It handled everything else perfectly though, which is awesome.). I decided to take a stab at fixing the bugs, which is this PR. I've broken down every test case addition + related fix into its own commit for easy review.

Here's the summary of changes in input/output:

Input Output in production Output after this PR
*italics **and bold** end* <p>*italics <strong>and bold</strong> end*</p> <p><em>italics <strong>and bold</strong> end</em></p>
*italics **and bold*** <p>*italics <strong>and bold</strong>*</p> <p><em>italics <strong>and bold</strong></em></p>
***bold** and italics* <p>*<strong>bold</strong> and italics*</p> <p><em><strong>bold</strong> and italics</em></p>
*start **bold** and italics* <p>*start <strong>bold</strong> and italics*</p> <p><em>start <strong>bold</strong> and italics</em></p>
*improper **nesting* is** bad <p>*improper <strong>nesting* is</strong> bad</p> <p><em>improper **nesting</em> is** bad</p>

@cddude229 cddude229 marked this pull request as ready for review May 2, 2024 02:14
@miekg
Copy link
Collaborator

miekg commented May 2, 2024

I don't see how the testcase makes the case for this PR?

Don't understand the random i +=2. Comments also didn't change even though the code around it changed very much

@kjk
Copy link
Contributor

kjk commented May 2, 2024

@miekg It does seem to fix the following:
In current code:
*italics **and bold** end*
=>
<p>*italics <strong>and bold</strong> end*</p>
See https://tools.arslexis.io/goplayground/#iPKnCL5art6

Per babelmark, most other return:
<p><em>italics <strong>and bold </strong> end</em></p>
see: https://babelmark.github.io/?text=*italics+**and+bold**+end*

The case *improper **nesting* is** bad gets closer to what majority does but not fully (see https://babelmark.github.io/?text=*improper+**nesting*+is**+bad)

…t treat them as the italics marker

This is giving us left-to-right precedence for applying styles (which is why the one existing test case changed)
This is a feedback loop from the "is next character c?" check.  Without this move, we may as well just `return 0, nil` from the "is next character c?" check.

This is an explicit edge case of ending with a triple
@cddude229 cddude229 force-pushed the chris-patch-nested-italics-and-bold branch from b27d897 to 46938c9 Compare May 2, 2024 19:11
@cddude229
Copy link
Author

I don't see how the testcase makes the case for this PR?

Don't understand the random i +=2. Comments also didn't change even though the code around it changed very much

Apologies - I got lazy at the end of the work day and should have provided more context. I've updated the PR description and split the main commit into two (to better highlight what each change does) in hopes it better communicates the goals.

I also realized another edge case and added a test for that + fixed it. This reverted the optimization case I was talking about as the bug was there instead.

Lemme know if you need more information!

@@ -1201,7 +1201,7 @@ func helperEmphasis(p *Parser, data []byte, c byte) (int, ast.Node) {
}

if i+1 < len(data) && data[i+1] == c {
i++
i += 2
Copy link
Author

@cddude229 cddude229 May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i++ guarantees the next character checked is c. This means helperFindEmphChar above returns a guaranteed 0 for length, which then short-circuits to the return 0, nil case. We choose to skip the next c character here by incrementing by 2 instead. (This will expose a separate bug if we're actually in the triple c case, which is what the next commit fixes.)

@@ -1192,9 +1192,6 @@ func helperEmphasis(p *Parser, data []byte, c byte) (int, ast.Node) {

for i < len(data) {
length := helperFindEmphChar(data[i:], c)
if length == 0 {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building on last commit, in the triple c case, we'd still have short-circuited here - but we actually want to claim this c as our own. So we do the rest of the validity checks first, and then do this check at the end to exit the loop

@cddude229
Copy link
Author

@miekg Just following up on this, as I think I addressed all your concerns. Let me know your thoughts and if you need me to do anything else!

kjk added a commit that referenced this pull request Jul 29, 2024
@kjk
Copy link
Contributor

kjk commented Jul 29, 2024

Thanks, I pushed it in 77f4768

Had to resolve a conflict and add a fix for infinite loop from #311 so couldn't just merge it

@kjk kjk closed this Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants