Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data Liberation] WP_WXR_Reader #1972

Merged
merged 27 commits into from
Nov 2, 2024
Merged
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
528cfd5
[Data Liberation] WP_WXR_Processor
adamziel Oct 31, 2024
428fb53
Refactor WXR_Action to WXR_Object
adamziel Oct 31, 2024
f013cc5
Rename WXR object $name
adamziel Oct 31, 2024
5aec8a0
Parse WXR attachments
adamziel Oct 31, 2024
654072f
Document the next steps for the WP_WXR_Processor
adamziel Oct 31, 2024
65a84ed
Add support for terms
adamziel Oct 31, 2024
4f1f7e5
Support wp:tag
adamziel Nov 1, 2024
6dedf9a
Make WP_WXR_Processor streaming
adamziel Nov 1, 2024
3cba344
Retain last post ID and last comment ID when parsing nested objects
adamziel Nov 1, 2024
76e2210
Get it to run without exceptions on theme unit test XML file
adamziel Nov 1, 2024
9fd6442
Add smoke tests for parsing existing WXR files used out in the wild
adamziel Nov 1, 2024
b9b8f66
Bring in WXR files from the WP PHPunit repo
adamziel Nov 1, 2024
47d9dc9
Add streaming tests
adamziel Nov 1, 2024
58f62a4
Make progress on the failing importer test
adamziel Nov 1, 2024
adf6eab
Parse XML names using @dmsnell's UTF-8 decoder
adamziel Nov 1, 2024
71b7500
Simplify the next_entity() logic in the WXR processor
adamziel Nov 2, 2024
004015c
Cleanup
adamziel Nov 2, 2024
f79e80d
Lint
adamziel Nov 2, 2024
6041871
Rename WP_WXR_Processor to WP_WXR_Reader
adamziel Nov 2, 2024
3cfb82f
Adjust the entity keys returned by WP_WXR_Reader
adamziel Nov 2, 2024
3da4b91
Expand the documentation inside parse_name
adamziel Nov 2, 2024
71a0487
Add documentation strings to WP_WXR_Reader
adamziel Nov 2, 2024
9f1b362
Add XML memory budget and auto-flush the buffer periodically
adamziel Nov 2, 2024
a475658
Lint
adamziel Nov 2, 2024
f7f4df2
Give @dmsnell credit for his fantastic UTF8 decoder
adamziel Nov 2, 2024
42b46b6
Add more meaningful inline documentation
adamziel Nov 2, 2024
efaacf9
Lint
adamziel Nov 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions packages/playground/data-liberation/src/utf8_decoder.php
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
<?php
/**
* UTF-8 decoding pipeline by Dennis Snell (@dmsnell), originally
* proposed in https://github.com/WordPress/wordpress-develop/pull/6883.
*
* It enables parsing XML documents with incomplete UTF-8 byte sequences
* without crashing or depending on the mbstring extension.
*/
Comment on lines +2 to +8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️


if ( ! defined( 'UTF8_DECODER_ACCEPT' ) ) {
define( 'UTF8_DECODER_ACCEPT', 0 );