Create ID's for Header elements so they can be referenced in anchor tags #765

netniV · 2020-04-29T16:56:35Z

This allows rendered markdown to be compatible when viewed via parsed markdown or GitHub

Closes #710

Ayesh · 2020-05-04T15:22:47Z

Parsedown.php

@@ -553,15 +553,19 @@ protected function blockHeader($Line)
        }

        $text = trim($text, ' ');
+        $link = strtolower(str_replace(' ','-',$text));


I think a more elaborate "sluggify" mechanism would be better here. For example, strtolower(preg_replace('/[^A-z-_]+'/, '-', $text)) to replace all non A-z/-/_ characters with a dash.

Yeah, I'm not sure what GitHub's method of replacement is but think we should probably be using:

strtolower(preg_replace('/[^A-Za-z0-9\-_]+/', '-', $text))

Would you agree with that one?

https://regex101.com/r/jWgxEc/1 - This is an example but I also had to include \v to prevent new lines from being picked up in the example.

You are right, this approach is a better start. GitHuv flavor looks quite complicated and forgiving, so I put my regex hat.

Test gist: https://gist.github.com/Ayesh/250786888e7f4f146117aa96afcd0071
Suggested regex: [^\p{L}\p{N}\p{M}-]+
Implementation: https://3v4l.org/melBh

The third heading in the Gist contains letters from my Sinhalese language. It has letters in both \p{L} (letters) and \p{M} (marks). These two, combined with \p{N} (numbers) and the dash gives us the negating regex that strips out everything else.

This library already requires mbstring extension, so I don't see a problem with using mb_strtolower.

OK, seems better 👍 I love it when a plan comes together.

I have just pushed a change, which takes parts of yours but changes str_replace as well since we are talking potential multi-byte characters. Seems to work with the gist, let me know what you think.

Ayesh · 2020-05-04T19:23:20Z

I was reading how league/commonmark does the slugs, and it turns out to be stripping a lot characters as well, so I sent a PR thephpleague/commonmark#467

Ayesh · 2020-05-04T19:25:57Z

Parsedown.php

@@ -553,7 +553,7 @@ protected function blockHeader($Line)
        }

        $text = trim($text, ' ');
-        $link = strtolower(str_replace(' ','-',$text));
+	$link = preg_replace('/[^\p{L}\p{N}\p{M}-]+/u', '', mb_strtolower(mb_ereg_replace(' ','-',$text)));


Do we absolutely need the mb_ereg_replace() here? It's not a regular expression, so we should be fine using str_replace(' ', '-', $text). GFM seems to convert consecutive spaces to consecutive dashes as well.

I'm not a decision maker, but as someone who spends a lot of time processing text and with this library, this looks great to me otherwise. Thank you!

I think so, as if you consider a four byte character, one of those could match a space. Thus it would invalidate that character or change it to something it's not meant to be. I never deal with them but if we really want to make it universal, I would suggest we do.

netniV · 2020-05-04T19:58:52Z

I was reading how php-league/commonmark does the slugs, and it turns out to be stripping a lot characters as well, so I sent a PR thephpleague/commonmark#467

Didn't even know about that one 👍

netniV · 2020-05-04T20:00:43Z

so the tests are failing now because the leading # is no longer applicable. The question is, should we trim the leading hyphen?

aidantwoods · 2020-05-04T21:22:55Z

Hi @netniV and @Ayesh thanks for the great work you've done towards these changes to arrive at a very sensible slug function :)

The topic of slugs (or header ID's) have come up a few times in the past (e.g.), and the position has previously been that there wasn't a universal choice that everyone using the library would want, so to avoid making the choice for everyone we'd leave it alone and up to extensions.

I've built upon the changes you've worked on here in PR #767, which modifies these changes for the new 2.0.x branch, as well as adding the option for a user-defined slug function (so that anyone who doesn't like the default can specify their own).

@netniV I've listed you as a co-author in the commit to that PR (which I hope is okay?) because most of the work has been done here in arriving at a sensible slug function to use.

netniV · 2020-05-04T22:01:46Z

Looks good. I added one comment as per our discussions here over str_replace. More involved is your change but I was thinking it should be an optional addition as it could affect other elements due to a duplication of ID

The only other thought is with trimming a trailing and leading hyphen as that seems wrong to keep those to me.

netniV · 2020-05-12T16:07:09Z

Does this mean this PR should now be closed as it will only be applied to the 2.0 version?

cpeel · 2021-01-02T17:22:04Z

@aidantwoods - is there a timeframe for when 2.x will be released? I'd rather have my users use the auto IDs for headers than enable ParsedownExtra.

netniV · 2021-05-16T09:38:48Z

It would be nice if this PR could be merged into the 1.8.x branch and a 1.8.1 released. Either that, or have 2.0 released. At the moment, I can't get these changes through composer without publishing my own version of the package.

Ayesh · 2021-05-16T09:53:11Z

I don't think drastic changes like this would be realistically merged to the 1.x branch. What I do is simply extend the Parsedown class, and override the necessary parts. It turned out a good approach long term because I wanted to add some extra markup on my own too.

netniV · 2021-05-16T09:58:18Z

Let me see if I can manage that. I did something for images so it may be possible.

Personally, I wouldn't call it a drastic change, but I can see ID conflicts could break things so I see where you are coming from. I would submit a forked version with it in, but that seems a waste for such a small feature.

netniV · 2021-05-16T11:19:14Z

Firstly, I think it would be good to have this 1.8.0-beta7 as a full release given there has been no other changes in the past year that I can see (Aug 2020). Requiring a beta version in a composer.json looks ... well bad to me. The reason that I need to use the beta version is for handler array usage.

Now that I have fixed on the latest versions, I am still getting a problem which is due to changes in this beta version causing an issue with your parsedown-extra class.

netniV/ParsedownID#1

netniV · 2021-05-16T12:37:14Z

@Ayesh or @aidantwoods do you have the ability to sort out the current mess of Parsedown vs Parsedown-Extra ? Due to the beta changes that haven't been published as a version, parsedown-extra relies on older versions. But to use the newer array-based handler definitions, you need this beta version and parsedown-extra apparently hasn't been updated in two years so can't handle it.

aidantwoods · 2021-10-16T01:00:38Z

It's been a while, but now very close to a finished product for 2.0 in Parsedown (#708) and ParsedownExtra (erusev/parsedown-extra#172). I'd appreciate any feedback you have :)

netniV · 2021-10-16T01:44:00Z

I will try and see if I have time. Things are rather busy at the moment though.

Create ID's for Header elements so they can be referenced in anchor tags

7ead769

Closes erusev#710

netniV force-pushed the master branch from 1ec668e to 7ead769 Compare April 29, 2020 16:57

Fix up unit tests following previous commit

cd6912d

Ayesh reviewed May 4, 2020

View reviewed changes

Fix issue with multi-byte string compatibility

680287b

Ayesh approved these changes May 4, 2020

View reviewed changes

aidantwoods mentioned this pull request May 4, 2020

Create ID's for Header elements so they can be referenced in anchor tags #767

Merged

gnowland added a commit to gnowland/parsedown that referenced this pull request Nov 4, 2020

add slug support erusev#765

e355231

cpeel mentioned this pull request Jan 2, 2021

Add ParsedownExtra extension DistributedProofreaders/dproofreaders#546

Merged

netniV mentioned this pull request May 16, 2021

Undefined index: text erusev/parsedown-extra#152

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create ID's for Header elements so they can be referenced in anchor tags #765

Create ID's for Header elements so they can be referenced in anchor tags #765

netniV commented Apr 29, 2020 •

edited

Loading

Ayesh May 4, 2020

netniV May 4, 2020 •

edited

Loading

netniV May 4, 2020

Ayesh May 4, 2020

netniV May 4, 2020

netniV May 4, 2020

Ayesh commented May 4, 2020

Ayesh May 4, 2020

netniV May 4, 2020

netniV commented May 4, 2020

netniV commented May 4, 2020 •

edited

Loading

aidantwoods commented May 4, 2020

netniV commented May 4, 2020

netniV commented May 12, 2020

cpeel commented Jan 2, 2021

netniV commented May 16, 2021

Ayesh commented May 16, 2021

netniV commented May 16, 2021

netniV commented May 16, 2021

netniV commented May 16, 2021

aidantwoods commented Oct 16, 2021

netniV commented Oct 16, 2021

Create ID's for Header elements so they can be referenced in anchor tags #765

Are you sure you want to change the base?

Create ID's for Header elements so they can be referenced in anchor tags #765

Conversation

netniV commented Apr 29, 2020 • edited Loading

Ayesh May 4, 2020

Choose a reason for hiding this comment

netniV May 4, 2020 • edited Loading

Choose a reason for hiding this comment

netniV May 4, 2020

Choose a reason for hiding this comment

Ayesh May 4, 2020

Choose a reason for hiding this comment

netniV May 4, 2020

Choose a reason for hiding this comment

netniV May 4, 2020

Choose a reason for hiding this comment

Ayesh commented May 4, 2020

Ayesh May 4, 2020

Choose a reason for hiding this comment

netniV May 4, 2020

Choose a reason for hiding this comment

netniV commented May 4, 2020

netniV commented May 4, 2020 • edited Loading

aidantwoods commented May 4, 2020

netniV commented May 4, 2020

netniV commented May 12, 2020

cpeel commented Jan 2, 2021

netniV commented May 16, 2021

Ayesh commented May 16, 2021

netniV commented May 16, 2021

netniV commented May 16, 2021

netniV commented May 16, 2021

aidantwoods commented Oct 16, 2021

netniV commented Oct 16, 2021

netniV commented Apr 29, 2020 •

edited

Loading

netniV May 4, 2020 •

edited

Loading

netniV commented May 4, 2020 •

edited

Loading