Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roundtripping emphasis in emphasis edge case changes document structure #12

Open
ChristianMurphy opened this issue Feb 3, 2021 · 6 comments
Labels
👍 phase/yes Post is accepted and can be worked on 🐛 type/bug This is a problem 🙆 yes/confirmed This is confirmed and ready to be worked on

Comments

@ChristianMurphy
Copy link
Member

ChristianMurphy commented Feb 3, 2021

Subject of the issue

***emphasis*in emphasis*

is stringified as:

\***emphasis*in emphasis*

which changes the structure

Your environment

  • OS: Ubuntu
  • Packages: mdast-util-to-markdown 0.6.2
  • Env: node v15.5.1, npm 7.3.0

Steps to reproduce

parse

***emphasis*in emphasis*

which has the structure

{
    "type": "root",
    "children": [
        {
            "type": "paragraph",
            "children": [
                {
                    "type": "text"
                },
                {
                    "type": "emphasis",
                    "children": [
                        {
                            "type": "emphasis",
                            "children": [
                                {
                                    "type": "text"
                                }
                            ]
                        },
                        {
                            "type": "text"
                        }
                    ]
                }
            ]
        }
    ]
}

and stringify it:

\***emphasis*in emphasis*

the resulting markdown has a different structure than the original

{
    "type": "root",
    "children": [
        {
            "type": "paragraph",
            "children": [
                {
                    "type": "text"
                },
                {
                    "type": "emphasis",
                    "children": [
                        {
                            "type": "text"
                        }
                    ]
                }
            ]
        }
    ]
}

📓 comparing how the two pieces of markdown text are being parsed with https://spec.commonmark.org/dingus it appears in both cases it is parsed as expected.

Expected behavior

structure is the same

Actual behavior

structure is different

@ChristianMurphy ChristianMurphy added 🐛 type/bug This is a problem 🙉 open/needs-info This needs some more info labels Feb 3, 2021
@wooorm
Copy link
Member

wooorm commented Feb 4, 2021

Here are some more examples:

a ***b*c d*

a \***b*c d*

a ***b* d*

a \***b* d*

Yields (CM dingus):

<p>a *<em><em>b</em>c d</em></p>
<p>a ***b<em>c d</em></p>
<p>a *<em><em>b</em> d</em></p>
<p>a *<em><em>b</em> d</em></p>

so whether that escape “works” relates also to what comes after the “run”.

I don’t really see an (easy) fix 🤔

@ChristianMurphy
Copy link
Member Author

ChristianMurphy commented Feb 5, 2021

A related edge case which can happen on the tail of emphasis next to emphasis

*a*_b__

@ChristianMurphy ChristianMurphy added 🙆 yes/confirmed This is confirmed and ready to be worked on and removed 🙉 open/needs-info This needs some more info labels May 19, 2021
@wooorm wooorm added the 👍 phase/yes Post is accepted and can be worked on label Aug 21, 2021
@github-actions
Copy link

Hi! This was marked as ready to be worked on! Note that while this is ready to be worked on, nothing is said about priority: it may take a while for this to be solved.

Is this something you can and want to work on?

Team: please use the area/* (to describe the scope of the change), platform/* (if this is related to a specific one), and semver/* and type/* labels to annotate this. If this is first-timers friendly, add good first issue and if this could use help, add help wanted.

@wooorm
Copy link
Member

wooorm commented Feb 4, 2022

I came up with a way to solve this, I think: syntax-tree/unist#60 (comment).

@wooorm
Copy link
Member

wooorm commented Oct 29, 2024

FYI, syntax-tree/unist#60 is solved. But, it is different than this. Most of the linked issues above are solved because of syntax-tree/unist#60. But not this issue itself.


I have been thinking more and more about this, and it gets incredibly complex.

*x*y z*

**x*y z*

***x*y z*

****x*y z*

*****x*y z*

Yields:

xy z*

**xy z

*xy z

**xy z

*****xy z

(dingus: https://spec.commonmark.org/dingus/?text=*x*y%20z*%0A%0A**x*y%20z*%0A%0A***x*y%20z*%0A%0A****x*y%20z*%0A%0A*****x*y%20z*%0A).

Please see the points near here: https://spec.commonmark.org/0.31.2/#can-open-emphasis.

There’s something at play here where, due to the different latter delimiter runs, and whether the opening run is divisible by 3, affects each other, which is changed by escaping a marker.

I have no clue how to deal with such complex escaping needs, where pulling a thread somewhere, will have something happen somewhere entirely different

@wooorm
Copy link
Member

wooorm commented Oct 29, 2024

Perhaps....... I will look into it later. My brain is broken. But maybe we can do this by switching markers when we have to escape a surrounding:

***x*y z*

\*_*x*y z_

***x*

\*\*_x_

*x**

_x_\*

_x__

*x*\_

These each generate the same HTML

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
👍 phase/yes Post is accepted and can be worked on 🐛 type/bug This is a problem 🙆 yes/confirmed This is confirmed and ready to be worked on
Development

No branches or pull requests

2 participants