-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create ID's for Header elements so they can be referenced in anchor tags #767
Create ID's for Header elements so they can be referenced in anchor tags #767
Conversation
7bfa9df
to
a39766a
Compare
Thank you, this is nice and looking forward to the 2.x releases. I also think it's better to make the slug function configurable. This library ideally should come from a DI container, and it makes sense that we make the slug function customizable with a sane default. |
If I'd suggest one thing if I may, I think the regex would be better with
So instead of |
Done, 99dd44e :) |
if (! isset($slugCallback)) { | ||
$this->slugCallback = function (string $text): string { | ||
$slug = \mb_strtolower($text); | ||
$slug = \str_replace(' ', '-', $slug); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may need to be an mb_ereg_replace or other Unicode compatible replace as str_replace isn’t mb aware and code break any Unicode character that has a byte code matching it I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking a lot about the usage of mb_ereg_replace
here. In Unicode, we have "spaces":
- U+0020 standard space (1 byte)
- U+200B zero width space (3 bytes)
- U+200C zero width non-joiner Unicode code point (3 bytes)
- U+200D zero width joiner Unicode code point (3 bytes)
- U+FEFF zero width no-break space Unicode code point (3 bytes)
str_replace(' ', '-')
can take care of the standard U+0020
because it's a visible character and we can easily replace it. It's possible to replace multi-byte characters with str_replace
, but for zero-width joiners/spaces, it's error prone and difficult to maintain.
However, with out str_replace(' ', '-') + preg_replace()
combo, we can eliminate all of them. Each U+0020
is replaced with -
, which follows GFM by str_replace
. Other 4 "spaces" are removed in the preg_match
call because they don't belong do the character classes we allow-list. By definition, Unicode must be not including spaces or other symbols in \p{L}
or \p{M}
classes. We also tell regex engine give the regex engine a heads-up about possible Unicode characters in expression and subject with the /u
flag.
I suppose an mb_ereg_replace
is probably necessary if the PHP file we are working with is encoded in UTF-16 or UTF-32, but PHP engine will not work with either of those encodings at all, so this wouldn't be a use case we'd have to handle at the library level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I can't argue with that :)
Trimming the leading and trailing hyphens seems sensible, I think I can also deal with the ID duplication too—GitHub seems to resolve this by appending |
It’s amazing how few lines of code can balloon ;) Work is looking good. |
f84ea1b
to
3a80a52
Compare
Indeed 😉 I moved some of the irrelevant test fixes out of this PR and have now rebased, so it is looking a little more focused now. Still at ~350 additions though :) Fortunately for anyone using these changes, they'll be able to modify slug behaviours (if they wish) with essentially a one-liner—which is quite nice :) |
Adds HeaderSlug configurable, with the option for the slug function to be customised. Co-authored-by: netniV <[email protected]>
As suggested by @Ayesh Co-authored-by: Ayesh Karunaratne <[email protected]>
3a80a52
to
8764512
Compare
I moved the |
Credit for these changes goes to the work done in #765 by @netniV and @Ayesh.
This PR rephrases the changes made there toward the 2.0.x branch.