Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore Data Liberation exporter #2078

Draft
wants to merge 6 commits into
base: trunk
Choose a base branch
from

Conversation

brandonpayton
Copy link
Member

@brandonpayton brandonpayton commented Dec 13, 2024

Motivation for the change, related issues

This PR explores a basic export that creates a zip containing a full WXR and all uploads.

Related to #2055

cc @adamziel

Implementation details

  • Uses the WordPress core WXR exporter
  • Puts the output in a zip
  • Rewrite URLs to reference local files (e.g. file://./wp-content/uploads)
  • Put the uploads in that same zip, nested under wp-content/uploads
  • Don't buffer the ZIP in memory, stream-output the files straight to the client

Testing Instructions (or ideally a Blueprint)

TBD

For manual testing, this PR adds a temporary endpoint for downloading a site's full export zip. GET /_data_liberation_test_export to receive an export zip.

@brandonpayton brandonpayton added [Type] Exploration An exploration that may or may not result in mergable code [Feature] Import Export [Aspect] Data Liberation labels Dec 13, 2024
@brandonpayton brandonpayton requested a review from a team December 13, 2024 04:04
@brandonpayton
Copy link
Member Author

Note: This PR current does a simple str_replace() of attachment URLs and definitely breaks GUIDs and maybe other things. 🙀

So ... a next step is to make targeted replacements using proper XML parser and real URL replacement.

@brandonpayton brandonpayton marked this pull request as draft December 13, 2024 14:08
if ( !$output_stream ) {
$output_stream = fopen('php://output', 'wb');
}
$zip_writer = new ZipStreamWriter( $output_stream );
Copy link
Collaborator

@adamziel adamziel Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try packaging the data using the same WP_Entity objects as the importer. We could then have a single streaming export pipeline that knows how to deal with entities on one end, and uses an arbitrary export drivers on the other end, e.g. WXR, Markdown, HTML, etc.

Even more importantly, we could serialize the exported entities, send them over the wire, and import without using any particular data format. That's important for site sync protocol and for things like the Try WordPress extension. Plus we could extend it to more data types, e.g. SQL dumps, Blueprint steps, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel, I don't know exactly what this means yet but will look at the importer work for reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spoke with Adam, and what we are talking about is basically making a WP_Entity iterator API that can be used to read WP entities from a site. Then the entity iterator API can be used to implement multiple exporters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan for this PR is to just tweak URL replacement to work properly and then leave open as a draft until it can be replaced with a proper exporter based on the entity iterator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Aspect] Data Liberation [Feature] Import Export [Type] Exploration An exploration that may or may not result in mergable code
Projects
Status: Needs review
Development

Successfully merging this pull request may close these issues.

2 participants