Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typesense Import: Index Race Condition #845

Closed
saibotk opened this issue Jul 8, 2024 · 18 comments
Closed

Typesense Import: Index Race Condition #845

saibotk opened this issue Jul 8, 2024 · 18 comments

Comments

@saibotk
Copy link
Contributor

saibotk commented Jul 8, 2024

Scout Version

10.9.0

Scout Driver

Typesense

Laravel Version

10.48.10

PHP Version

8.3.9

Database Driver & Version

Typesense 0.25.2

SDK Version

Typesense PHP 4.9.3

Meilisearch CLI Version

No response

Description

When deleting the index and afterwards importing all models, we can reach a race condition, where the import fails because some jobs run in parallel and all try to create or access the index.

Only one succeeds and the others will, for example, throw this error:
Typesense\Exceptions\ObjectNotFound: Not Found
or interestingly, only rarely, Typesense\Exceptions\ObjectAlreadyExists: A collection with name already exists..

This happens when using queued imports. I don't exactly understand why it returns Not Found instead of the more rare collection already exists error, which is logical, but this might be some technical detail in the typesense engine.

E.g. using Horizon with minProcesses = 5 and running the following command on a table with enough entries e.g. ~ 40.000 will sometimes result in those exceptions on some of the jobs.

We should try to make the import more robust and maybe ensure that the index is created before the documents are imported? I'm happy to hear your solutions on this, but i think we should fix this flaw in scout directly.

Steps To Reproduce

In any scout repository with Horizon or some queue workers and a table with enough elements such that chunking happens, run:

php artisan scout:delete-index "App\Models\User" && php artisan scout:import "App\Models\User"

And sometimes the race condition will cause some jobs to fail.

@saibotk saibotk changed the title Typesense Import Index Creation Race Condition Typesense Import: Index Race Condition Jul 8, 2024
@driesvints
Copy link
Member

@saibotk
Copy link
Contributor Author

saibotk commented Jul 8, 2024

Seems like #822 is related

@saibotk
Copy link
Contributor Author

saibotk commented Jul 8, 2024

I just did some more deep-diving:

  • scout:flush in typesense does also delete the schema / index entirely on the server, but the local exists bool is never reset to null, so it will assume that it exists. This is one of the issues that we need to solve. I think this is the cause of the ObjectNotFound exception. I found this by using flush instead of delete-index.
  • the other one still needs investigation, maybe there are different code paths and some weird race conditions on what horizon process has which state etc. idk

A small workaround:
When calling artisan scout:flush or scout:delete-index run this afterwards (maybe create a wrapper artisan command):

use Laravel\Scout\EngineManager;
use App\Models\Commodity;

$ts = app(EngineManager::class)->engine();
$model = new Commodity();

$ts->flush($model);
$ts->getCollections()->offsetUnset($model->searchableAs());

Note: This will only happen in queue workers, where i suppose the engine is kept alive? because the process is not restarted after a job?
PS: Yes, so this is mainly a caching / state management issue, due to the collection exists cache:
CleanShot 2024-07-08 at 17 15 38

@driesvints
Copy link
Member

Thanks a lot for your investigation @saibotk. If you're up for it, definitely feel free to send in a PR with a proposed change.

@jasonbosco
Copy link
Contributor

jasonbosco commented Jul 9, 2024

@saibotk thank you for digging into this. Let me know if you're planning to do a PR for this.

If not, @tharropoulos from the Typesense team is also available to look into this.

@saibotk
Copy link
Contributor Author

saibotk commented Jul 9, 2024

I could try to come up with some fix for this, but am currently pretty swamped.
Also I think it is necessary to see how this might need some broader change. Because currently i don't know how the cached exist came to be and in which way we need to preserve the functionality of that caching.

Obviously we could just drop the cache and always do an exists query but im also not sure if there might be some better solution using other typesense methods.

Im open to discuss approaches and would appreciate any insights

@tharropoulos
Copy link
Contributor

tharropoulos commented Jul 10, 2024

When I tried to take a look at the server logs, if you use queue:work, these requests are sent:
image

If you then restart the worker and execute the import once again, these requests are sent to Typesense:
image

If you then use queue:listen, these requests are sent, and the jobs don't fail.
image

It must be applicable to the static state generated when flushing the collection, since the change is reflected on the queue:listen command. I'm still looking into it, as I'm not sure what's the best way to approach this would be. I'm not sure we should drop the cache as @saibotk already mentioned.

EDIT: provided some more info regarding the differences between queue:listen and queue:work

@tharropoulos
Copy link
Contributor

The getOrCreateCollectionFromModel method in TypesenseEngine is not calling out to the API, having its state of collections as a single source of truth:

protected function getOrCreateCollectionFromModel($model, bool $indexOperation = true): TypesenseCollection
{
$method = $indexOperation ? 'indexableAs' : 'searchableAs';
$collection = $this->typesense->getCollections()->{$model->{$method}()};
if ($collection->exists() === true) {
return $collection;
}

To ensure matching the state from the server to the worker, if a collection is found, it should then call out to the Typesense server to see if the collection exists on the server as well, if not, it will have to create it on the server.

    protected function getOrCreateCollectionFromModel($model, bool $indexOperation = true): TypesenseCollection
    {
        $method = $indexOperation ? 'indexableAs' : 'searchableAs';

        $collection = $this->typesense->getCollections()->{$model->{$method}()};

        // Initialize a flag to check if the collection exists
        $collectionExists = false;
        if ($collection->exists()) {
            $collectionName = $model->{$method}();
            try  {
                $this->typesense->collections[$collectionName]->retrieve();
                $collectionExists = true;
            } catch (TypesenseClientError $e){
                $collectionExists = false;
            }
        }

        // Use the flag to determine the next steps
        if ($collectionExists) {
            return $this->typesense->getCollections()->{$collectionName};
        }
...

@tharropoulos
Copy link
Contributor

tharropoulos commented Jul 10, 2024

@saibotk First of all, thank you for taking the time to investigate so thoroughly and getting us in the right track. Could you verify that this change fixes the problems on your side as well?

EDIT: fix typo

taylorotwell added a commit that referenced this issue Jul 10, 2024
…846)

* feat(typesense): sync server state in getOrCreateCollectionFromModel

This commit updates the `getOrCreateCollectionFromModel` method in
`TypesenseEngine` to verify if a collection exists on the server. If not
found, it is created. This ensures consistency between the worker and
server states.

* style: fix styleCI errors

refactor: remove uneeded cast of flag

The collectionExists flag was already cast as false, so recasting it
serves no purpose

* Update TypesenseEngine.php

---------

Co-authored-by: Taylor Otwell <[email protected]>
@saibotk
Copy link
Contributor Author

saibotk commented Jul 10, 2024

Yes i will do so on friday when im back at work thanks for taking the time 🙌

@saibotk
Copy link
Contributor Author

saibotk commented Jul 12, 2024

@tharropoulos Okay got time to test this now and thats my result:

  • Importing after flush now works and is not dependent on the static state in the worker, thats nice!
  • If you quickly repeat a "flush && import" command you can still get it to print a message that collection X does not exist (even though it does not throw an error) That's correct because flush is immediate and thus a job might try to post a request to the import endpoint when the collection was just deleted, so nothing wrong here!
  • Also I think there might be some edge cases where this still fails or rather just tries to create a collection even though it already exists (if it is set to false), but maybe that is fine by design?
  • With this change i don't know why we now even bother keeping track of the "exists" variable. It seems like this state is only used inside of the getOrCreate function and no where else. It could even be deleted entirely from this repository and the typesense php sdk too, since we now check if it actually exists anyway.

What do you think?

Thank you for the quick response!

saibotk added a commit to saibotk/scout that referenced this issue Jul 12, 2024
In //laravel#820 we introduced an exists check method, which then lead to static state issues as described in laravel#845

Those where addressed in laravel#846

But now that the function always double-checks the existence, we can remove the exists check entirely and only rely on the Typesense server response for this state.

This "fixes" issues where the server already has a collection and the client would try to recreate it. E.g. when it has flushed the index and another process or worker then creates the collection.
saibotk added a commit to saibotk/scout that referenced this issue Jul 12, 2024
In laravel#820 we introduced an exists check method, which then lead to static state issues as described in laravel#845

Those where addressed in laravel#846

But now that the function always double-checks the existence, we can remove the exists check entirely and only rely on the Typesense server response for this state.

This "fixes" issues where the server already has a collection and the client would try to recreate it. E.g. when it has flushed the index and another process or worker then creates the collection.
saibotk added a commit to saibotk/scout that referenced this issue Jul 12, 2024
In laravel#820 we introduced an exists check method, which then lead to static state issues as described in laravel#845

Those where addressed in laravel#846

But now that the function always double-checks the existence, we can remove the exists check entirely and only rely on the Typesense server response for this state.

This "fixes" issues where the server already has a collection and the client would try to recreate it. E.g. when it has flushed the index and another process or worker then creates the collection.
saibotk added a commit to saibotk/scout that referenced this issue Jul 12, 2024
In laravel#820 we introduced an exists check method, which then lead to static state issues as described in laravel#845

Those where addressed in laravel#846

But now that the function always double-checks the existence, we can remove the exists check entirely and only rely on the Typesense server response for this state.

This "fixes" issues where the server already has a collection and the client would try to recreate it. E.g. when it has flushed the index and another process or worker then creates the collection.
@driesvints
Copy link
Member

Just checking in after being out for a bit: was this fixed now?

@saibotk
Copy link
Contributor Author

saibotk commented Jul 22, 2024

@driesvints Yes the issue itself is fixed and this can be closed, the followup PR is just to improve the fix.
Thanks!

@saibotk saibotk closed this as completed Jul 22, 2024
taylorotwell added a commit that referenced this issue Aug 5, 2024
* refactor(typesense): remove unused exists checks

In #820 we introduced an exists check method, which then lead to static state issues as described in #845

Those where addressed in #846

But now that the function always double-checks the existence, we can remove the exists check entirely and only rely on the Typesense server response for this state.

This "fixes" issues where the server already has a collection and the client would try to recreate it. E.g. when it has flushed the index and another process or worker then creates the collection.

* Update TypesenseEngine.php

---------

Co-authored-by: Taylor Otwell <[email protected]>
@patrickomeara
Copy link

I just tried to import ~1.6m rows in chunks of 500 and hit this error 431 times before hitting a memory issue on a node.

I'm on TypeSense cloud v26 and Scout v10.11. I ran the import directly after a flush.

Is there a recommended way to do an initial import. I'll turn the queue off and try again once I have upgraded my nodes in TypeSense Cloud.

@saibotk
Copy link
Contributor Author

saibotk commented Aug 6, 2024

Weird, the main issue should be fixed in that version.

Though can you try out the latest commit? I'm curious whether that fixes something for your use case @patrickomeara.

@patrickomeara
Copy link

OK, after the node upgrade had finished I flushed and re-imported and only got the ObjectAlreadyExists exception once. It could be that my horizon configuration overwhelmed the smaller TypeSense cluster's ingest capabilities for the first few minutes. I doubled the memory and it imported fine.

Thanks for your swift reply @saibotk

@liran-co
Copy link

I believe the issue reported here still exists after #846. If you import greater than the chunk size (500) on a queue server, there exists a race condition that can cause the Typesense\Exceptions\ObjectAlreadyExists: A collection with name already exists. error. Note that this occurs even on a fresh import without a flush before.

@liran-co
Copy link

Throwing a try/catch around the collection creation call at line 526 fixes the issue. It occurs when two chunks are picked up by two queue processes at the same time, causing a race condition:

        try {
            $this->typesense->getCollections()->create($schema);

            $collection->setExists(true);
        } catch (TypesenseClientError $e) {
            $collection->retrieve();

            $collection->setExists(true);
        }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants