Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML characters should not convert to symbol when editing as HTML #22337

Open
mapk opened this issue May 13, 2020 · 4 comments
Open

HTML characters should not convert to symbol when editing as HTML #22337

mapk opened this issue May 13, 2020 · 4 comments
Labels
[Package] HTML entities /packages/html-entities [Type] Bug An existing feature does not function as intended

Comments

@mapk
Copy link
Contributor

mapk commented May 13, 2020

Describe the bug
Carrying over an issue from here: https://wordpress.org/support/topic/editor-entities-in-text-mode-copy-paste-in-visual-mode/. This issue focused on the Classic Editor, but it seems to be the case with Gutenberg as well.

When editing as HTML, and typing the HTML character for a symbol (ie. —) it gets converted to the symbol. However, when typing &, that does not get converted. We should not convert any of them while editing as HTML.

To reproduce
Steps to reproduce the behavior:

  1. Create a Paragraph block. Type some text.
  2. Select the "Edit as HTML" option from the ellipses icon in the toolbar.
  3. Add "&" to the text. Notice that it does not convert.
  4. Now type "—" and notice that this automatically converts. (it should not)

Expected behavior
While editing as HTML, the characters should not convert.

Screenshots

html

Editor version (please complete the following information):

  • WordPress version: 5.4
  • Gutenberg 8.1

Possibly related to: #13860

@mapk mapk added [Type] Bug An existing feature does not function as intended [Package] HTML entities /packages/html-entities labels May 13, 2020
@joyously
Copy link

While editing as HTML, the characters should not convert.

The entities should not ever be converted to characters. The database should contain entities. The browser will show the entities correctly when showing as HTML.

@azaozz
Copy link
Contributor

azaozz commented May 14, 2020

This is not a simple fix. All HTML entities are also UTF-8 characters (see https://dev.w3.org/html5/html-author/charref) but most are usable only in a web browser. However the post content (or the editor "output") may be used in other places, like RSS feeds, emails, etc. The "htmlspecialchars" are the only entities required for XML/HTML and are generally understood everywhere.

Storing other entities in the DB would probably cause some backwards compatibility issues and affect several other WP components: Formatting, Charset, perhaps Database, and possibly others.

@pipfrosch
Copy link

Hi, I would like to express the issue from an accessibility point of view.
I have epilepsy and have hit my head a lot. As a result, the pathways from my brain to my finger do weird things when typing, I frequently type similar but different words to what is in my brain, I think what is happening is the incorrect muscle memory gets triggered and sent to the fingers but I'm not sure.

What does that have to do with entities? Well, I have trouble visually distinguishing left/right single/double quotes, em/en dash, etc. so I type the entity (I use the numbered entities as I do a lot in XML where HTML entities aren't defined) because when proofreading, it is easier for me to distinguish ‘ from ’ than it is for me to visually distinguish ‘ from ’

But in WordPress they get converted so I have trouble when proofreading determining if the wrong combination came out of my fingers.

@lathanh
Copy link

lathanh commented May 31, 2024

This issue is also preventing me from being able to use non-BMP unicode (including emoji) at all.

If I try to save a draft with a such a character, I get the error "Updating failed. Could not update post in the database." I believe this is because my MySQL database[1] uses utf8mb3, which can only store BMP characters (which excludes many characters, such as “🛈”, and most emoji).

So, I tried to enter the HTML entity instead (like demonstrated by OP), but the editor automatically replaces it with the unicode character, thwarting my attempt to use entities as a workaround (and I haven't found any other workaround).

[1] I'm on a hosted solution (EasyWP) where I'm not sure that I can change the database/table/column character sets. Even if I could, I still think this behavior in WP should be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Package] HTML entities /packages/html-entities [Type] Bug An existing feature does not function as intended
Projects
None yet
Development

No branches or pull requests

5 participants