-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create ID's for Header elements so they can be referenced in anchor tags #765
base: master
Are you sure you want to change the base?
Conversation
Parsedown.php
Outdated
@@ -553,15 +553,19 @@ protected function blockHeader($Line) | |||
} | |||
|
|||
$text = trim($text, ' '); | |||
$link = strtolower(str_replace(' ','-',$text)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a more elaborate "sluggify" mechanism would be better here. For example, strtolower(preg_replace('/[^A-z-_]+'/, '-', $text))
to replace all non A-z/-/_ characters with a dash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not sure what GitHub's method of replacement is but think we should probably be using:
strtolower(preg_replace('/[^A-Za-z0-9\-_]+/', '-', $text))
Would you agree with that one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://regex101.com/r/jWgxEc/1 - This is an example but I also had to include \v to prevent new lines from being picked up in the example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, this approach is a better start. GitHuv flavor looks quite complicated and forgiving, so I put my regex hat.
Test gist: https://gist.github.com/Ayesh/250786888e7f4f146117aa96afcd0071
Suggested regex: [^\p{L}\p{N}\p{M}-]+
Implementation: https://3v4l.org/melBh
The third heading in the Gist contains letters from my Sinhalese language. It has letters in both \p{L}
(letters) and \p{M}
(marks). These two, combined with \p{N}
(numbers) and the dash gives us the negating regex that strips out everything else.
This library already requires mbstring extension, so I don't see a problem with using mb_strtolower
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, seems better 👍 I love it when a plan comes together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just pushed a change, which takes parts of yours but changes str_replace as well since we are talking potential multi-byte characters. Seems to work with the gist, let me know what you think.
I was reading how |
@@ -553,7 +553,7 @@ protected function blockHeader($Line) | |||
} | |||
|
|||
$text = trim($text, ' '); | |||
$link = strtolower(str_replace(' ','-',$text)); | |||
$link = preg_replace('/[^\p{L}\p{N}\p{M}-]+/u', '', mb_strtolower(mb_ereg_replace(' ','-',$text))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we absolutely need the mb_ereg_replace()
here? It's not a regular expression, so we should be fine using str_replace(' ', '-', $text)
. GFM seems to convert consecutive spaces to consecutive dashes as well.
I'm not a decision maker, but as someone who spends a lot of time processing text and with this library, this looks great to me otherwise. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, as if you consider a four byte character, one of those could match a space. Thus it would invalidate that character or change it to something it's not meant to be. I never deal with them but if we really want to make it universal, I would suggest we do.
Didn't even know about that one 👍 |
so the tests are failing now because the leading # is no longer applicable. The question is, should we trim the leading hyphen? |
Hi @netniV and @Ayesh thanks for the great work you've done towards these changes to arrive at a very sensible slug function :) The topic of slugs (or header ID's) have come up a few times in the past (e.g.), and the position has previously been that there wasn't a universal choice that everyone using the library would want, so to avoid making the choice for everyone we'd leave it alone and up to extensions. I've built upon the changes you've worked on here in PR #767, which modifies these changes for the new @netniV I've listed you as a co-author in the commit to that PR (which I hope is okay?) because most of the work has been done here in arriving at a sensible slug function to use. |
Looks good. I added one comment as per our discussions here over str_replace. More involved is your change but I was thinking it should be an optional addition as it could affect other elements due to a duplication of ID The only other thought is with trimming a trailing and leading hyphen as that seems wrong to keep those to me. |
Does this mean this PR should now be closed as it will only be applied to the 2.0 version? |
@aidantwoods - is there a timeframe for when 2.x will be released? I'd rather have my users use the auto IDs for headers than enable ParsedownExtra. |
It would be nice if this PR could be merged into the 1.8.x branch and a 1.8.1 released. Either that, or have 2.0 released. At the moment, I can't get these changes through composer without publishing my own version of the package. |
I don't think drastic changes like this would be realistically merged to the 1.x branch. What I do is simply extend the |
Let me see if I can manage that. I did something for images so it may be possible. Personally, I wouldn't call it a drastic change, but I can see ID conflicts could break things so I see where you are coming from. I would submit a forked version with it in, but that seems a waste for such a small feature. |
Firstly, I think it would be good to have this 1.8.0-beta7 as a full release given there has been no other changes in the past year that I can see (Aug 2020). Requiring a beta version in a composer.json looks ... well bad to me. The reason that I need to use the beta version is for handler array usage. Now that I have fixed on the latest versions, I am still getting a problem which is due to changes in this beta version causing an issue with your parsedown-extra class. |
@Ayesh or @aidantwoods do you have the ability to sort out the current mess of Parsedown vs Parsedown-Extra ? Due to the beta changes that haven't been published as a version, parsedown-extra relies on older versions. But to use the newer array-based handler definitions, you need this beta version and parsedown-extra apparently hasn't been updated in two years so can't handle it. |
It's been a while, but now very close to a finished product for 2.0 in Parsedown (#708) and ParsedownExtra (erusev/parsedown-extra#172). I'd appreciate any feedback you have :) |
I will try and see if I have time. Things are rather busy at the moment though. |
This allows rendered markdown to be compatible when viewed via parsed markdown or GitHub
Closes #710