WP-CLI Out Of Memory on search-replace actions with large serialized objects - Buffer? #64

JanThiel · 2024-05-27T09:23:45Z

Describe the bug
Hey @Pierre-Lannoy hope you are doing well :-)
This is quite an "early" bug report while we have not yet 100% identified the root cause but having some quite reasonable ideas of it. Hoping for some highly valued input of yours.

The issue:
We face(d) massive memory spikes when using WP-CLIs search-replace on tables with many large (~100kb each, several 100s of them) serialized objects. These serialized objects leads WP-CLI to use PHP for the search-replace on these fields.
And as such using the WP PHP API (update_... functions) to de-serialize, search-replace and then serialize and update the data again. Performing this on a table of ~80MB size lead to a memory consumption of >20GB ending with Out of Memory killer being triggered.
Running the same wp search-replace ... with --skip-plugins=decalog mitigated this issue. The search-replace went through in seconds. No memory spikes, no mentionable memory consumption.

We assume - but again have not verified it yet (sorry!) - the buffer option of Decalog to be the root of this. If this applies to WP-CLI calls as well, it might explain the seen behaviour as the log calls with all the data in it are stored in memory and would only be logged once the CLI "request" ends?! Does that makes sense?

If you agree on this as a possible source, could you suggest a "hotfix" we could try out to ignore the buffer option in CLI sessions? We could try this out and give you feedback on it as we have this issue reproducible ;-)

Notable: We use a redis object cache (Object Cache Pro) which is inflicted as well. Yet when disabling Decalog that component does not seem to cause any issues.
And we still use Datadog as target log system.

To Reproduce
Steps to reproduce the behavior (not verified yet!):

Fill the wp_postmeta table with many large, serialized and nested objects
Make sure the buffer option in Decalog is enabled
Run wp search-replace old-value new-value wp_postmeta --dry-run

Memory should spike and the operation should take a long time.

Expected behavior
No memory spike and only some seconds execution time.
Maybe ignore the buffer option in CLI sessions and always flush the log calls directly? This might increase the CLI runtimes slightly but will prevent scaling effects. If these are the case here.

Environment (please complete the following information):

PHP version: 8.1.28
WordPress version and flavour: 6.4.2 Multisite / CLI
Other info: Object Cache / Redis

Thank you very much as always :-)

The text was updated successfully, but these errors were encountered:

Pierre-Lannoy · 2024-05-28T06:12:08Z

Hello @JanThiel !
Hope you're doing well too…
Thank you so much for this comprehensive report, which will - once again - really help me 😍

I will try to reproduce it and see if it's the buffering mechanism that produces this "effect". My preliminary analysis is that your assumption seems correct…
I'm more concerned (if that's the case) about finding a way to solve this problem. My intuition tells me that omitting the buffer is likely to make processing time incredibly long (one API call for each event). Maybe processing bulk-calls (like if the buffer has a "limit" before sending) would be better. Don't know yet.

Anyway, let's start by confirming your analysis.

Can you just tell me if you can see if such a wpcli command produces many events?

Pierre-Lannoy · 2024-05-28T06:17:27Z

Oh, and by the way, if you really need to execute this command now, the best approach for me seems to pause the logger before the command and reactivate it after. You will be "blind" and I agree it's a dirty workaround. But it is better than other methods I can think of…

JanThiel · 2024-05-28T21:32:33Z

Hey @Pierre-Lannoy,

very well over here in Germany except of the common spreading stupidity. But let's not jump into politics :-) Thank you very much!

Using the WP-CLI flag --skip-plugins=decalog works fine :-) As it skips loading decalog within the WP bootstrapping it's an easy workaround with no need of manual deactivation of the plugin. So it's an easy and acceptable workaround we added to our standard procedures internally.

Can you just tell me if you can see if such a wpcli command produces many events?

In general thre aren't really any entries within Datadog about these operations as we have the log level reduced in our Prod env.
We investigated further today and can confirm that WP-CLI does not use the WP PHP API to update the DB values. It's only happening via SQL and a custom PHP script for serialized data ( https://github.com/wp-cli/search-replace-command/blob/main/src/Search_Replace_Command.php#L539 ).

That does make me wonder where Decalog enters the call stack.

What did cross my mind was the question whether it might be helpful to be able to get the Decalog "outputs" printed out within a CLI session when they happen. Like the --debug flat of WP-CLI. That would allow us to actually identify what it might be that happen in Decalog while the search-replace command runs. Would be interesting for manual app debugging as well in general.

Regarding your thoughts on the buffer and a potential limit: I highly agree that no buffer is no solution either. Especially when HTTP calls are invoked. Kind of a buffer limit (time or memory based?) with forced flushes seems smart to me too on second thought.
In addition to the current issues we commonly hit situations in the past where the log output of Decalog was lost whenever a script hit the PHP memory limits and died with OOM as the buffer was unable to be flushed at all. That might be solved by such an approach as well. At least partly. Currently we were kind of blind in these situations. Only the local log files contained the information about the OOM event.

Have a great night,

Jan

Pierre-Lannoy · 2024-06-14T08:05:04Z

Hello @JanThiel !
I haven't forgotten you ;)
So far, I haven't been able to reproduce your problem. I have, however, discovered a memory management problem that could lead to what you're experiencing.
I'm still working on it, but I'll be releasing 4.2.0 version in some hours/days (which adds a few new features for those who use Datadog on custom endpoints - not your case) but that won't solve this issue.
I plan to fix this issue just as soon the 4.2.0 is released.
Sorry for this delay.

What did cross my mind was the question whether it might be helpful to be able to get the Decalog "outputs" printed out within a CLI session when they happen.

There is already a way to do (a sort of) this (but, yes, it requires shmop php extension to be enable): just launch a second terminal and do a wp log tail --level=info. But I understand it is not really what you ask for. Let me think about such a feature…

Have a nice day :)

Pierre-Lannoy · 2024-07-04T17:34:45Z

Hello @JanThiel 😊
So, here are the news about this issue…

I haven't been able to reproduce your issue. Even with high events-load. But I understand you experience it and so, maybe, others. So, I started from the assumption that the root cause that seemed the most likely was the right one and I implemented:

a buffer size limit to send events via batches even if not in shutdown hook;
a "circuit breaker" to allow to not buffering WP CLI events.

The default values are buffer size = 1000 events and no buffering for WP CLI. It is possible to change that by revealing in DecaLog options tab the "advanced settings". To do that, just add: add_filter( 'perfopsone_show_advanced', '__return_true' ); where it seems the more appropriate…

But, as I'm not able to reproduce the issue, I have no way to verify this modification fixes it… If you want to test it, it would be marvelous 😬

JanThiel · 2024-07-15T12:48:54Z

@Pierre-Lannoy Thank you very much - as always. Highly appreciate your thoughtful and forthcoming work :-)

I just needed to confirm that we have reliable test-cases to check against before getting back to you. And I am now comfortable that we have.
I just ran some benchmarks we created and have reproducible increased runtimes of wp search-replace of ~30% when decalog is active. With --skip-plugins=decalog the same CLI runs about 30% faster.

Let me know how to give the fix a shot :-)

Pierre-Lannoy · 2024-07-26T08:08:31Z

Hello @JanThiel !
Have you had a chance to try out the fix?

JanThiel · 2024-07-26T08:44:43Z

Still waiting for your reply:

Let me know how to give the fix a shot :-)

Pierre-Lannoy · 2024-07-26T08:49:43Z

OMG! Sorry…
The current version in master branch contains this fix.
To try it, just checkout the master branch, and use perfopsone_show_advanced hook if you want to tweak/test other parameters.
So sorry for the misunderstanding…

JanThiel · 2024-07-26T08:57:06Z

No problem and thank you very much for the clarification! 😀

I will run the test on Monday and get back to you with the result.

Pierre-Lannoy · 2024-09-09T08:56:20Z

Hello @JanThiel !
Hope you are well.
Have you, by any chance, had time to test this fix?

Pierre-Lannoy · 2024-10-05T15:44:07Z

Hello @JanThiel!
Hope you are well…
Just to let you know this change will be included in version 4.3.0 (release scheduled for next days of October.

JanThiel added the bug Something isn't working label May 27, 2024

JanThiel assigned Pierre-Lannoy May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WP-CLI Out Of Memory on search-replace actions with large serialized objects - Buffer? #64

WP-CLI Out Of Memory on search-replace actions with large serialized objects - Buffer? #64

JanThiel commented May 27, 2024

Pierre-Lannoy commented May 28, 2024

Pierre-Lannoy commented May 28, 2024

JanThiel commented May 28, 2024 •

edited

Loading

Pierre-Lannoy commented Jun 14, 2024

Pierre-Lannoy commented Jul 4, 2024

JanThiel commented Jul 15, 2024

Pierre-Lannoy commented Jul 26, 2024

JanThiel commented Jul 26, 2024

Pierre-Lannoy commented Jul 26, 2024

JanThiel commented Jul 26, 2024

Pierre-Lannoy commented Sep 9, 2024

Pierre-Lannoy commented Oct 5, 2024

WP-CLI Out Of Memory on search-replace actions with large serialized objects - Buffer? #64

WP-CLI Out Of Memory on search-replace actions with large serialized objects - Buffer? #64

Comments

JanThiel commented May 27, 2024

Pierre-Lannoy commented May 28, 2024

Pierre-Lannoy commented May 28, 2024

JanThiel commented May 28, 2024 • edited Loading

Pierre-Lannoy commented Jun 14, 2024

Pierre-Lannoy commented Jul 4, 2024

JanThiel commented Jul 15, 2024

Pierre-Lannoy commented Jul 26, 2024

JanThiel commented Jul 26, 2024

Pierre-Lannoy commented Jul 26, 2024

JanThiel commented Jul 26, 2024

Pierre-Lannoy commented Sep 9, 2024

Pierre-Lannoy commented Oct 5, 2024

JanThiel commented May 28, 2024 •

edited

Loading