Skip to content
This repository was archived by the owner on May 30, 2025. It is now read-only.

[Blueprints] Import WXRs via the DataLiberation importer (first stab) #25

Open
wants to merge 6 commits into
base: trunk
Choose a base branch
from

Conversation

adamziel
Copy link
Contributor

@adamziel adamziel commented May 29, 2025

Adapts the import-markdown-directory.php script from the create-wp-site tool for ImportContentStep to start using the Data Liberation importer in Blueprints v2.

For example, this Blueprint would use the Data Liberation importer:

{
	"version": 2,
	"content": [
		{
			"type": "wxr",
			"source": "https://raw.githubusercontent.com/wordpress/blueprints/trunk/blueprints/stylish-press/site-content.wxr"
		}
	]
}

In this initial exploration:

  • The runner puts php-toolkit.phar in the target site directory
  • The step handler creates the import-markdown-directory.php script in the target site directory
  • The step handler buffers the entire WXR file into the target site directory
  • The step handler runs the import-markdown-directory.php script in a subprocess, pointing it to the buffered WXR file

Remainig work work

  • Source php-toolkit.phar from somewhere in a production blueprints.phar release. GitHub Releases maybe?

Follow-up work

  • Remove Data Liberation files from the blueprints.phar build to trim it down to ~400KB
  • Pass configuration options to the importer, e.g. allowed media domains, author mapping mode etc.
  • Stream the WXR file directly from its reference. Do not buffer it first. Ditto for the "type": "posts" import mode.
  • Use the Data Liberation importer also for importing posts.
  • Pass progress updates from the importer script to the Blueprint runner
  • Use the same data reference resolution mechanism in Blueprints and in the importer to support sourcing media files from any remote execution context (e.g. a git repo).
    • Idea: Don't process downloads in the subprocess. Instead, use message-exchange to run all the downloads from the parent process.
      • Downside: Message-passing complexity.
      • Upsides: Resource management is centralized. We can resolve things once without worrying about duplication between the parent process and child. No need for exceptions such as "Do not fetch these assets in the runner, let the child process handle that". We'd automatically respect execution context boundaries.

@adamziel adamziel changed the title [Blueprints] first stab at importing WXRs using the DataLiberation importer [Blueprints] Import WXRs via the DataLiberation importer (first stab) May 29, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant