Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StreamChain: An API for streams-processing data (e.g. HTTP → ZIP → XML → HTML) #1

Closed
wants to merge 72 commits into from
Closed
Show file tree
Hide file tree
Changes from 68 commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
3650b18
Stream rewrite URLs in a remote WXR file
adamziel Jul 15, 2024
d44f701
Experiment with pipe interface
adamziel Jul 16, 2024
736783f
Use pipes for rewrite-remote-wxr.php
adamziel Jul 16, 2024
22dea1d
Remove needs_more() method from WritableStream
adamziel Jul 16, 2024
b4290f0
Filtering and demultiplexing via metadata piping
adamziel Jul 16, 2024
7596794
Use a fancier pipe
adamziel Jul 16, 2024
17c5950
Explore automatic demultiplexing based on the stream class definition
adamziel Jul 20, 2024
86316f3
Fix the intermittent broken pipe
adamziel Jul 20, 2024
542ac35
Explore streams as iterators
adamziel Jul 20, 2024
75f1217
Explore PipeContext
adamziel Jul 20, 2024
9143dc0
Implement context hierarchy and skipping upstream files
adamziel Jul 21, 2024
1a47f08
Use the word "file", not "resource"
adamziel Jul 21, 2024
e360d08
Use consistent file_* naming for file-related methods
adamziel Jul 21, 2024
ceceac5
Explore Unix-like stdin, stderr, stdout-based piping approach.
adamziel Jul 21, 2024
855eb14
Rename Composite to ShellCommandsChain to refer to 'cat | sort'
adamziel Jul 21, 2024
6e4aae5
Experiment with propagating errors
adamziel Jul 22, 2024
6858ca5
Propagate both the bytes and the error details, use the pipe eof stat…
adamziel Jul 22, 2024
aab911b
Add a missing null check
adamziel Jul 22, 2024
1760251
Ability to reassign stdin and stdout
adamziel Jul 22, 2024
14950cd
Add streaming ZIP reader and the ability to skip files in the upstrea…
adamziel Jul 22, 2024
827b28d
Make Demultiplexer work with multiple zip files
adamziel Jul 22, 2024
fb0a44e
Use ::stream methods to declare a pipe
adamziel Jul 22, 2024
71b9be7
Accept process instance in ProcessManager::spawn()
adamziel Jul 22, 2024
a3f93af
Add $process->run(); method to exhaust the entire stdin stream.
adamziel Jul 22, 2024
c516336
Add a long list of todos
adamziel Jul 22, 2024
a57c2aa
Remove set_write_channel and other similar methods
adamziel Jul 22, 2024
e14f7bb
Remove ProcessManager
adamziel Jul 22, 2024
b8804f4
Simplify channel management – use the 'channel' metadata parameter
adamziel Jul 22, 2024
fb51807
Process the entire stdin and stdout at each stage
adamziel Jul 22, 2024
cc80840
Use fresh stream metadata on each tick
adamziel Jul 22, 2024
ba47244
MultiChannelPipe: overwrite metadata on write
adamziel Jul 22, 2024
bf1145e
Document the channel_id choice used in ZipReaderProcess
adamziel Jul 22, 2024
9e22a23
Process all downstream chunks before pulling more upstream chunks
adamziel Jul 23, 2024
bf37654
Update the note about Pipe and MultiChannelPipe
adamziel Jul 23, 2024
e856fbe
Add more thoughts about the streamable interface
adamziel Jul 23, 2024
9057f18
Add more thoughts about demultiplexing
adamziel Jul 23, 2024
485ab61
Make $pipe->read() return true on success and add a $pipe->consume_by…
adamziel Jul 23, 2024
da29e85
Add return type declarations
adamziel Jul 23, 2024
f557abc
Ramble more in the todo comment
adamziel Jul 23, 2024
765bcfc
Rely on simple BufferedPipes, don't use MultiplexedPipe
adamziel Jul 23, 2024
7601985
Make ProcessChain an iterator
adamziel Jul 23, 2024
bd0e812
Add commentary about merging pipes and processes
adamziel Jul 23, 2024
c356525
Expose the Process instance as $context, expose direct access to the …
adamziel Jul 30, 2024
5b6ac94
Unwrap the contextual process from Demultiplexer
adamziel Jul 30, 2024
9ec21c7
Make fds protected, not public. Make TransformProcessor a descendant …
adamziel Jul 30, 2024
937afad
Rename stdin/stdout/stderr to input/output/errors
adamziel Jul 30, 2024
ddab380
Remove the concept of reaping processes
adamziel Jul 30, 2024
312cbf1
Separate the crash() and finish() methods
adamziel Jul 30, 2024
342f927
Rename Process to Stream
adamziel Jul 30, 2024
f925c43
Inline the XML rewriting callback
adamziel Jul 30, 2024
fb95720
Make all public fields protected or private
adamziel Jul 30, 2024
73369ba
Finish renaming processes to streams
adamziel Jul 30, 2024
04b2e4c
Replace stream factory method with a static create() method
adamziel Jul 30, 2024
8bc278c
Rename some more process variables and methods to stream taxonomy
adamziel Jul 30, 2024
9649103
Try a vastly simplified approach in pipes-unified.php
adamziel Jul 30, 2024
2fed566
Rename next_chunk() to next_bytes()
adamziel Jul 30, 2024
c83c104
Add a get_bytes() interface method
adamziel Jul 30, 2024
b4eced3
Separate IFilesStream
adamziel Jul 30, 2024
c07da9b
Group IByteSteram methods
adamziel Jul 30, 2024
20f78ed
Explore an approach based on creating a Stream class and passing a ha…
adamziel Jul 30, 2024
0650f8a
Move more logic methods to Byte_Stream
adamziel Jul 30, 2024
9d4f77c
Call append_eof() on stream, not stream state
adamziel Jul 30, 2024
0a8ccc3
Simplify ProcessorByteStream-based streams
adamziel Jul 30, 2024
57e6594
Rename tick() to generate_next_chunk()
adamziel Jul 31, 2024
c1d8ed3
Small refactor
adamziel Jul 31, 2024
98c2e36
Inheritance: Byte_Stream > Callback_Byte_Stream > Processor_Byte_Stream
adamziel Jul 31, 2024
97bdc31
Rename ZIP_Processor to ZIP_Reader
adamziel Jul 31, 2024
1234a10
Return StreamChain from the current() method
adamziel Jul 31, 2024
bd19ad7
Add a "main loop" that processes each stage of the pipeline explicitly
adamziel Aug 27, 2024
daaba8a
A loop-based API without nested loops
adamziel Aug 27, 2024
3c07f99
Prototype pause() and resume() methods to make the stream processing …
adamziel Sep 30, 2024
b7102b7
Prototype a reentrant ZipStreamReaderLocal
adamziel Sep 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions IStreamProcessor.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<?php

interface IStreamProcessor {
public function append_bytes(string $bytes);
public function is_finished(): bool;
public function is_paused_at_incomplete_input(): bool;
public function get_last_error(): ?string;
}
2 changes: 1 addition & 1 deletion blueprints-library
Submodule blueprints-library updated from e08b22 to 3b8943
54 changes: 54 additions & 0 deletions bootstrap.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
<?php

// Where to find the streaming WP_XML_Processor
// Use a version from this PR: https://github.com/adamziel/wordpress-develop/pull/43
define('WP_XML_API_PATH', __DIR__ );
define('SITE_TRANSFER_PROTOCOL_PATH', __DIR__ . '/site-transfer-protocol' );
define('BLUEPRINTS_LIB_PATH', __DIR__ . '/blueprints-library/src/WordPress' );
if(!file_exists(WP_XML_API_PATH . '/class-wp-token-map.php')) {
copy(WP_XML_API_PATH.'/../class-wp-token-map.php', WP_XML_API_PATH . '/class-wp-token-map.php');
}

$requires[] = WP_XML_API_PATH . "/IStreamProcessor.php";
$requires[] = WP_XML_API_PATH . "/class-wp-html-token.php";
$requires[] = WP_XML_API_PATH . "/class-wp-html-span.php";
$requires[] = WP_XML_API_PATH . "/class-wp-html-text-replacement.php";
$requires[] = WP_XML_API_PATH . "/class-wp-html-decoder.php";
$requires[] = WP_XML_API_PATH . "/class-wp-html-attribute-token.php";

$requires[] = WP_XML_API_PATH . "/class-wp-html-decoder.php";
$requires[] = WP_XML_API_PATH . "/class-wp-html-tag-processor.php";
$requires[] = WP_XML_API_PATH . "/class-wp-html-open-elements.php";
$requires[] = WP_XML_API_PATH . "/class-wp-token-map.php";
$requires[] = WP_XML_API_PATH . "/html5-named-character-references.php";
$requires[] = WP_XML_API_PATH . "/class-wp-html-active-formatting-elements.php";
$requires[] = WP_XML_API_PATH . "/class-wp-html-processor-state.php";
$requires[] = WP_XML_API_PATH . "/class-wp-html-unsupported-exception.php";
$requires[] = WP_XML_API_PATH . "/class-wp-html-processor.php";

$requires[] = WP_XML_API_PATH . "/class-wp-xml-decoder.php";
$requires[] = WP_XML_API_PATH . "/class-wp-xml-tag-processor.php";
$requires[] = WP_XML_API_PATH . "/class-wp-xml-processor.php";
$requires[] = WP_XML_API_PATH . "/class-wp-wxr-normalizer.php";
$requires[] = WP_XML_API_PATH . "/functions.php";
$requires[] = BLUEPRINTS_LIB_PATH . "/Streams/StreamWrapperInterface.php";
$requires[] = BLUEPRINTS_LIB_PATH . "/Streams/StreamWrapper.php";
$requires[] = BLUEPRINTS_LIB_PATH . "/Streams/StreamPeekerWrapper.php";
$requires[] = BLUEPRINTS_LIB_PATH . "/AsyncHttp/Request.php";
$requires[] = BLUEPRINTS_LIB_PATH . "/AsyncHttp/Response.php";
$requires[] = BLUEPRINTS_LIB_PATH . "/AsyncHttp/HttpError.php";
$requires[] = BLUEPRINTS_LIB_PATH . "/AsyncHttp/Connection.php";
$requires[] = BLUEPRINTS_LIB_PATH . "/AsyncHttp/Client.php";
$requires[] = BLUEPRINTS_LIB_PATH . "/AsyncHttp/StreamWrapper/ChunkedEncodingWrapper.php";
$requires[] = BLUEPRINTS_LIB_PATH . "/AsyncHttp/StreamWrapper/InflateStreamWrapper.php";

$requires[] = SITE_TRANSFER_PROTOCOL_PATH . '/src/WP_Block_Markup_Processor.php';
$requires[] = SITE_TRANSFER_PROTOCOL_PATH . '/src/WP_Block_Markup_Url_Processor.php';
$requires[] = SITE_TRANSFER_PROTOCOL_PATH . '/src/WP_Migration_URL_In_Text_Processor.php';
$requires[] = SITE_TRANSFER_PROTOCOL_PATH . '/src/WP_URL.php';
$requires[] = SITE_TRANSFER_PROTOCOL_PATH . '/src/functions.php';
$requires[] = SITE_TRANSFER_PROTOCOL_PATH . '/vendor/autoload.php';

foreach ($requires as $require) {
require_once $require;
}
Loading