Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disabling Scout sync during Feed Me Import #180

Open
mgburns opened this issue Nov 23, 2020 · 19 comments
Open

Disabling Scout sync during Feed Me Import #180

mgburns opened this issue Nov 23, 2020 · 19 comments

Comments

@mgburns
Copy link
Contributor

mgburns commented Nov 23, 2020

I'm trying to figure out how I can temporarily disable Algolia sync during Feed Me imports.

I know it can be done manually via the settings page if we omit it from config/scout.php but we're hiding those in prod so I'm trying to figure out a way to do it programmatically. I see that Feed Me fires events before / after feed processing but I'm not sure what I would do on the Scout end to make this work.

@timkelty
Copy link
Collaborator

timkelty commented Dec 7, 2020

@mgburns Yep, agreed this is annoying. Adding to my list!

@lenvanessen
Copy link

@timkelty would you accept a PR for this? Need this as well. Most likely will go with the same approach as Blitz (a batch mode flag)

@janhenckens
Copy link
Member

Hey @lenvanessen! Sure, happy to review a PR for this if you can write it up 👍

@joepagan
Copy link

joepagan commented Mar 3, 2022

Just chipping in that if your indices are limited to just commerce elements run on the import you could set:

scout.php config:

'sync' => false,

feedme event

Event::on(
  Process::class,
  Process::EVENT_AFTER_PROCESS_FEED,
    function(FeedProcessEvent $event) {
      try {
        Craft::$app->runAction('scout/index/refresh');
      } catch (InvalidRouteException | Exception $e) {
        Craft::error($e->getMessage(), __METHOD__);
        throw new Exception($e->getMessage());
      }
    }
);

@jamesmacwhite
Copy link
Contributor

Has anyone made any progress in turning off Algolia sync events during a FeedMe import or deferring until the end?

Disabling the sync outright prevents general entries that are updated by Control Panel users being synchronised automatically, so it's not an ideal solution, you can hook into various events with FeedMe to trigger imports instead, but I was wondering if anyone had thoughts of a solution while keeping the original sync option enabled.

@lenvanessen
Copy link

@jamesmacwhite I've done it using the a listener on the queue:

 /**
     * We prevent the jobs from Scout to be triggered during CP requests.
     * This way we don't bloat the Queue with syncing products individually, but we can
     * just trigger a bulk sync after the import has completed
     */
    private function _preventSyncJobs(): void
    {
        Event::on(
            Queue::class,
            Queue::EVENT_BEFORE_PUSH,
            function(PushEvent $event) {
                if($event->job instanceof MakeSearchable) {
                    $event->handled = true;
                }
            }
        );
    }
    ```
    
And then, i've bound this to only be triggered if it's a console request in my modules init:

if (Craft::$app->getRequest()->isConsoleRequest) {
$this->_preventSyncJobs();
}

@jamesmacwhite
Copy link
Contributor

@lenvanessen Interesting solution, I'll take a look.

You can technically disable the queue jobs with queue => false, this obviously has performance implications, but if you run the queue with a worker that's separate you can do that without too many issues, but it does mean they are being triggered still individually on feed importing.

I am however a fan of triggering these at the end of a successful feed import, leveraging feed events, because it will probably also speed up the import process.

Do you still have sync enabled with this event before push code, so general entry saves in the CP aren't impacted.

That's my main concern, the feed imports are one thing, I still want the automatic sync behaviour to work when entries not imported via FeedMe are modified.

@lenvanessen
Copy link

@jamesmacwhite, there are a couple of solutions you can go with, for one, you can temporary change the .env. You can see an example in the SetupController from Craft, where they set certain variables.

So in your case, you could set the scout.php to:

'queue' => App::env('ENABLE_SCOUT_QUEUE') === 'true';

And then using the feed-me imports

Event::on(
  Process::class,
  Process::EVENT_BEFORE_PROCESS_FEED,
    function(FeedProcessEvent $event) {
        Craft::$app->getConfig()->setDotEnvVar('ENABLE_SCOUT_QUEUE', false);
    }
);

And once the import is done:

Event::on(
  Process::class,
  Process::EVENT_AFTER_PROCESS_FEED,
    function(FeedProcessEvent $event) {
        Craft::$app->getConfig()->setDotEnvVar('ENABLE_SCOUT_QUEUE', true);
    }
);

Or, if you don't want to touch the .env, you could set a mutex lock:

Event::on(
  Process::class,
  Process::EVENT_BEFORE_PROCESS_FEED,
    function(FeedProcessEvent $event) {
         Craft::$app->getMutex()->acquire('FEED_ME_PROCESSING', 15);
    }
);

And then make the code I previously shared like this:

  Event::on(
            Queue::class,
            Queue::EVENT_BEFORE_PUSH,
            function(PushEvent $event) {
                if($event->job instanceof MakeSearchable) {
                    $event->handled =  Craft::$app->getMutex()->isAcquired('FEED_ME_PROCESSING');
                }
            }
        );

And then after the import is done, release it

Event::on(
  Process::class,
  Process::EVENT_AFTER_PROCESS_FEED,
    function(FeedProcessEvent $event) {
          Craft::$app->getMutex()->release('FEED_ME_PROCESSING');
    }
);

@jamesmacwhite
Copy link
Contributor

@lenvanessen Thanks. This is an interesting solution! My concern around using a mutex lock would be you could potentially trigger a mutex lock error if Feed Me processing would be happening to be occurring while another entry that's connected to Algolia events (not part of FeedMe) tries to be updated at the same time, but I'll have to do some testing.

@jamesmacwhite
Copy link
Contributor

jamesmacwhite commented Jan 27, 2025

Revisiting this as I think it is an area that needs considering or documentation officially to a certain point.

If you have a lot of Feed Me feeds running, having indexing element queue jobs raised per element, becomes less efficient and instead doing a batch queue job to sync the index after the feed import has completed seems better as discussed.

Turning off sync outright isn't the solution because this wouldn't work for non Feed Me contexts. I think the best solution would be to essentially turn off sync events during Feed Me imports and then raise an index refresh at the end and re-enable sync events. The only issue here, is there is potentially a very limited but possible scenario of non Feed Me related elements not being synced when a Feed Me import occurs.

The other scenario would be ensuring the index sync queue job provided by Scout also supports being batchable in the event the amount of items being refreshed is large and exceeds the queue TTR.

Thoughts?

@jamesmacwhite
Copy link
Contributor

This is the solution I have tested.

  1. Stop the IndexElement jobs from being pushed into the queue by Scout if the custom mutex lock is defined, essentially when a Feed Me import has started.
  2. Implement two Feed Me events for before process feed and after per the concept originally documented by @lenvanessen. The after process event releasing the lock and then pushing the ImportIndex to update elements in a single job.

Testing entries not being held by the Feed Me mutex lock this still allows for syncing to work, so it doesn't require env var manipulation.

Event::on(
    Queue::class,
    Queue::EVENT_BEFORE_PUSH,
    static function(PushEvent $event) {
        if ($event->job instanceof IndexElement) {
            $event->handled = Craft::$app->getMutex()->isAcquired('FEED_ME_PROCESSING');
        }
    }
);

Event::on(
    Process::class,
    Process::EVENT_BEFORE_PROCESS_FEED,
    static function(FeedProcessEvent $event) {
        $feed = $event->feed;
        Craft::info("Acquiring mutex lock for pausing Scout indexing element jobs during Feed Me import of feed: $feed->name.");
        Craft::$app->getMutex()->acquire('FEED_ME_PROCESSING', 15);
    }
);

Event::on(
    Process::class,
    Process::EVENT_AFTER_PROCESS_FEED,
    static function(FeedProcessEvent $event) {

        $feed = $event->feed;
        $feedId = (int)$feed->id;

        // Map feed ID to Algolia index
        $feedMeScoutIndices = [
            4 => 'Index1',
            7 => 'Index2',
            9 => 'Index3'
        ];

        Craft::info("Releasing mutex lock for Scout indexing elements, Feed Me processing finished for feed: $feed->name.");
        Craft::$app->getMutex()->release('FEED_ME_PROCESSING');

        // Check if the feed ID is related to an Algolia index in Scout for running an ImportIndex job
        $scoutAlgoliaIndex = $feedMeScoutIndices[$feedId] ?? null;

        if ($scoutAlgoliaIndex) {

            $queueJob = QueueHelper::push(new ImportIndex([
                'indexName' => $scoutAlgoliaIndex
            ]));

            if (!$queueJob) {
                Craft::error("Failed to queue Scout ImportIndex job for Algolia index: $scoutAlgoliaIndex");
            }
            else {
                Craft::info("Successfully queued Scout ImportIndex job for Algolia index: $scoutAlgoliaIndex with queue job ID: $queueJob");
            }
        }
    }
);

@timkelty
Copy link
Collaborator

timkelty commented Feb 5, 2025

For Craft 5, it sounds like Bulk Operations is how this should be tackled.

I'm already in the process of converting Feed Me to batch jobs, so will post back here when that is complete.

@janhenckens
Copy link
Member

That would be a massive improvement @timkelty! Would give plugins - like Scout - a clean way to piggyback of those batched operations.

@timkelty
Copy link
Collaborator

timkelty commented Feb 5, 2025

@janhenckens I'll work on the Feed Me side, but you can/should implement the bulk op stuff regardless, as it will come into play, e.g. when someone runs a CLI resave/elements.

Holler if you need any guidance. As it is relatively new, the docs are bit light.

@jamesmacwhite
Copy link
Contributor

Just wanted to chime in and say this new bulk op event is amazing and should make Feed Me and Scout better optimised for large indexing jobs.

@janhenckens
Copy link
Member

@janhenckens I'll work on the Feed Me side, but you can/should implement the bulk op stuff regardless, as it will come into play, e.g. when someone runs a CLI resave/elements.

Oh, that's right. I'll have a look to see if I can detect whether something is a bulk operation versus when it's a single save. I'll ping you on Discord should I get stuck :)

@jamesmacwhite
Copy link
Contributor

Running Scout indexing jobs via bulk operation would be very much welcome, in conjunction with Feed Me or resave/entries the Scout plugin can cause a very large amount of indiviual indexing element jobs to be raised from such events currently. Causing a queue backlog.

@timkelty
Copy link
Collaborator

Feed Me PR for batch jobs, thanks to @i-just craftcms/feed-me#1598

@jamesmacwhite
Copy link
Contributor

Looking forward to when Scout can use bulk operations to avoid indiviual indexing elements jobs for Feed Me imports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants