Create ID's for Header elements so they can be referenced in anchor tags #767

aidantwoods · 2020-05-04T21:22:38Z

Credit for these changes goes to the work done in #765 by @netniV and @Ayesh.

This PR rephrases the changes made there toward the 2.0.x branch.

Ayesh · 2020-05-04T21:31:16Z

Thank you, this is nice and looking forward to the 2.x releases. I also think it's better to make the slug function configurable. This library ideally should come from a DI container, and it makes sense that we make the slug function customizable with a sane default.

Ayesh · 2020-05-04T21:35:22Z

If I'd suggest one thing if I may, I think the regex would be better with p{Nd}\p{Nl} instead of p{N}.

\p{N} includes superscript numbers and divisions, which ideally should not make to a URL, and neither does in GFM.

So instead of $slug = \preg_replace('/[^\p{L}\p{N}\p{M}-]+/u', '', $slug);, we could use $slug = \preg_replace('/[^\p{L}\p{Nd}\p{Nl}\p{M}-]+/u', '', $slug);. This converts 测试 x² 标题 to 测试-x-标题 (notice the ² gone in slug).

aidantwoods · 2020-05-04T21:42:44Z

So instead of $slug = \preg_replace('/[^\p{L}\p{N}\p{M}-]+/u', '', $slug);, we could use $slug = \preg_replace('/[^\p{L}\p{Nd}\p{Nl}\p{M}-]+/u', '', $slug);.

Done, 99dd44e :)

netniV · 2020-05-04T21:58:22Z

src/Configurables/HeaderSlug.php

+        if (! isset($slugCallback)) {
+            $this->slugCallback = function (string $text): string {
+                $slug = \mb_strtolower($text);
+                $slug = \str_replace(' ', '-', $slug);


This may need to be an mb_ereg_replace or other Unicode compatible replace as str_replace isn’t mb aware and code break any Unicode character that has a byte code matching it I think

I was thinking a lot about the usage of mb_ereg_replace here. In Unicode, we have "spaces":

U+0020 standard space (1 byte)

U+200B zero width space (3 bytes)

U+200C zero width non-joiner Unicode code point (3 bytes)

U+200D zero width joiner Unicode code point (3 bytes)

U+FEFF zero width no-break space Unicode code point (3 bytes)

str_replace(' ', '-') can take care of the standard U+0020 because it's a visible character and we can easily replace it. It's possible to replace multi-byte characters with str_replace, but for zero-width joiners/spaces, it's error prone and difficult to maintain.

However, with out str_replace(' ', '-') + preg_replace() combo, we can eliminate all of them. Each U+0020 is replaced with -, which follows GFM by str_replace. Other 4 "spaces" are removed in the preg_match call because they don't belong do the character classes we allow-list. By definition, Unicode must be not including spaces or other symbols in \p{L} or \p{M} classes. We also tell regex engine give the regex engine a heads-up about possible Unicode characters in expression and subject with the /u flag.

I suppose an mb_ereg_replace is probably necessary if the PHP file we are working with is encoded in UTF-16 or UTF-32, but PHP engine will not work with either of those encodings at all, so this wouldn't be a use case we'd have to handle at the library level.

Well, I can't argue with that :)

aidantwoods · 2020-05-05T15:56:21Z

@netniV RE your comment

Looks good. I added one comment as per our discussions here over str_replace. More involved is your change but I was thinking it should be an optional addition as it could affect other elements due to a duplication of ID

The only other thought is with trimming a trailing and leading hyphen as that seems wrong to keep those to me.

Trimming the leading and trailing hyphens seems sensible, I think I can also deal with the ID duplication too—GitHub seems to resolve this by appending -n to duplicated IDs (where n is an increasing counter per duplication), and I think this is a sensible approach to use. I'll probably generalise it a bit so that custom handling of de-duplication is possible (e.g. maybe you want to use a prefix, or a different separator).

netniV · 2020-05-05T19:12:18Z

It’s amazing how few lines of code can balloon ;)

Work is looking good.

aidantwoods · 2020-05-05T21:41:17Z

It’s amazing how few lines of code can balloon ;)

Indeed 😉

I moved some of the irrelevant test fixes out of this PR and have now rebased, so it is looking a little more focused now. Still at ~350 additions though :)

Fortunately for anyone using these changes, they'll be able to modify slug behaviours (if they wish) with essentially a one-liner—which is quite nice :)

Adds HeaderSlug configurable, with the option for the slug function to be customised. Co-authored-by: netniV <[email protected]>

@Ayesh

As suggested by @Ayesh Co-authored-by: Ayesh Karunaratne <[email protected]>

aidantwoods · 2020-05-10T13:34:04Z

I moved the MutableConfigurable changes into a separate PR (#768) and have rebased this a final time (so now only the changes directly related to adding the slug are in this PR :) )

aidantwoods added this to the 2.0.0 milestone May 4, 2020

aidantwoods mentioned this pull request May 4, 2020

Create ID's for Header elements so they can be referenced in anchor tags #765

Open

aidantwoods force-pushed the enhancement/header-slug branch from 7bfa9df to a39766a Compare May 4, 2020 21:24

netniV reviewed May 4, 2020

View reviewed changes

aidantwoods force-pushed the enhancement/header-slug branch from f84ea1b to 3a80a52 Compare May 5, 2020 21:22

aidantwoods and others added 4 commits May 10, 2020 14:31

Add HeaderSlug configurable

e332b47

Adds HeaderSlug configurable, with the option for the slug function to be customised. Co-authored-by: netniV <[email protected]>

Strip superscripts and divisions from eventual slug

d8bf075

As suggested by @Ayesh Co-authored-by: Ayesh Karunaratne <[email protected]>

Trim leading and trailing hyphens from slug

4e99e29

Add SlugRegister so IDs are not duplicated

8764512

aidantwoods force-pushed the enhancement/header-slug branch from 3a80a52 to 8764512 Compare May 10, 2020 13:32

aidantwoods merged commit 0c5e8c1 into erusev:2.0.x May 10, 2020

aidantwoods deleted the enhancement/header-slug branch May 10, 2020 13:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create ID's for Header elements so they can be referenced in anchor tags #767

Create ID's for Header elements so they can be referenced in anchor tags #767

aidantwoods commented May 4, 2020

Ayesh commented May 4, 2020

Ayesh commented May 4, 2020

aidantwoods commented May 4, 2020

netniV May 4, 2020

Ayesh May 4, 2020

netniV May 4, 2020

aidantwoods commented May 5, 2020

netniV commented May 5, 2020

aidantwoods commented May 5, 2020

aidantwoods commented May 10, 2020

Create ID's for Header elements so they can be referenced in anchor tags #767

Create ID's for Header elements so they can be referenced in anchor tags #767

Conversation

aidantwoods commented May 4, 2020

Ayesh commented May 4, 2020

Ayesh commented May 4, 2020

aidantwoods commented May 4, 2020

netniV May 4, 2020

Choose a reason for hiding this comment

Ayesh May 4, 2020

Choose a reason for hiding this comment

netniV May 4, 2020

Choose a reason for hiding this comment

aidantwoods commented May 5, 2020

netniV commented May 5, 2020

aidantwoods commented May 5, 2020

aidantwoods commented May 10, 2020