-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data Liberation] Add HTML to Blocks converter #2095
Conversation
Adds a basic WP_HTML_To_Blocks class that accepts HTML and outputs block markup. It only considers the markup and won't consider any visual changes introduced via CSS or JavaScript. A part of #1894 ## Example ```html $html = <<<HTML <meta name="post_title" content="My first post"> <p>Hello <b>world</b>!</p> HTML; $converter = new WP_HTML_To_Blocks( $html ); $converter->convert(); var_dump( $converter->get_all_metadata() ); /* * array( 'post_title' => array( 'My first post' ) ) */ var_dump( $converter->get_block_markup() ); /* * <!-- wp:paragraph --> * <p>Hello <b>world</b>!</p> * <!-- /wp:paragraph --> */ ``` ## Testing instructions This PR mostly adds new code. Just confirm the unit tests pass in CI.
@@ -0,0 +1,70 @@ | |||
<?php | |||
|
|||
abstract class WP_Entity_Reader implements \Iterator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need an abstraction yet in this PR, but I'm about to propose another one with more entity readers.
packages/playground/data-liberation/src/wordpress-core-html-api/class-wp-html-processor.php
Outdated
Show resolved
Hide resolved
3a3c8c3
to
92fda0a
Compare
It would be good if this creates the same output as the JS (DOM-based) converter. If @dmsnell ends up creating a JS HTML API, we could potentially move the JS one to it. Could the HTML API somehow be language agnostic? |
@ellatrix What do you mean by the DOM-based converter? Would this involve changes in this code? Or would this involve designing the DOM-based version to match functionality with the PHP-based version? |
Adds a basic
WP_HTML_To_Blocks
class that accepts HTML and outputs block markup.It's a very basic converter. It only considers the markup and won't consider any visual changes introduced via CSS or JavaScript. Only a few core blocks are supported in this initial PR. The API can easily support more HTML elements and blocks.
To preserve visual fidelity between the original HTML page and the produced block markup, we'll need an annotated HTML input produced by the Try WordPress browser extension. It would contain each element's colors, sizes, etc. We cannot possibly get all from just analyzing the HTML on the server without building a full-blown, browser-like HTML renderer in PHP, and I know I'm not building one.
A part of #1894
Example
Caveats
I had to patch WP_HTML_Processor to stop baling out on
<meta>
tags referencing the document charset. Ideally we'd patch WordPress core to stop baling out when the charset is UTF-8.Testing instructions
This PR mostly adds new code. Just confirm the unit tests pass in CI.
cc @brandonpayton @zaerl @sirreal @dmsnell @ellatrix