Store vector embeddings for talks, and display related talks #19

crohr · 2023-06-20T15:40:06Z

This PR adds a "Talks you might be interested in" below each talk. This uses the newly released vector support in meilisearch to find talks that share some similarity with the current one.

Support for vector search is currently non-existant in meilisearch-rails and meilisearch-ruby so we cannot easily (?) fetch the similarity scores to keep only the most interesting ones, but the results are quite good as it is.

To make it work, you need to specify an OpenAI API key in .env, and launch a reindex as follows:

Talk.reembed!

Note that vector support is still a bit fiddly, so you may have to start from a fresh meilisearch database if it doesn't work for you on the first try (tip: inspect GET localhost:7700/tasks to see if anything is going wrong when indexing).

Demo below:

rubyvideo-meilisearch-vectors.mp4

Announcement post on meilisearch - AI Search Mechanisms meilisearch/product#621 (comment)

Closes #18.

app/models/talk.rb

config/initializers/meilisearch.rb

adrienpoly · 2023-06-21T05:24:50Z

Thanks @crohr for this prototype it looks very promising.

Embeddings

To move forward if we want to have this in production I guess we could as a first step put in place the logic to compute the embeddings and store them in the talk model. With some kind of logic to recompute them every time title / description changes.
This should prevent recomputing the embeddings for every reindex

Then Meillisearch would just index that new column from our model.

Can this work or I am missing something?

UI

I supposed you put the list below for testing. Ultimately this should feed the cards that are on the right and replace this random suggestion. But while developing let's keep it like that. At some point, we might want to deploy this feature to prod behind a feature flag so that will make it easier to test

Devops

To release it to prod I will need to update the Meillisearch engine. We probably need to wait a bit as it seems pretty new and they highly recommend waiting. This being said whatever result we get out of it will always be better than a random suggestion.

crohr · 2023-06-21T21:01:12Z

@Kerollmops I've tried using will_save_change_to__vectors? (see https://github.com/meilisearch/meilisearch-rails#custom-attribute-definition) to make meilisearch avoid reindexing vectors in case title or description hasn't changed, but it looks like _vectors is not seen as an attribute when querying the index settings, and therefore the code in meilisearch-rails doesn't go through the will_save method.

  # this is never called
  def will_save_change_to__vectors?
    will_save_change_to_title? || will_save_change_to_description?
  end

Any reason why this line calls settings.get_attributes instead of get_attributes (get_attributes does have the _vectors key)?

crohr · 2023-06-21T21:11:55Z

Embeddings

To move forward if we want to have this in production I guess we could as a first step put in place the logic to compute the embeddings and store them in the talk model. With some kind of logic to recompute them every time title / description changes. This should prevent recomputing the embeddings for every reindex

Then Meillisearch would just index that new column from our model.

Can this work or I am missing something?

I tried to selectively tell meilisearch to ignore _vectors when title or description hasn't changed, but it doesn't seem to work. So yes storing there embedding in sqlite and recomputing with classic AR callbacks if title or description changes would work.

UI

I supposed you put the list below for testing. Ultimately this should feed the cards that are on the right and replace this random suggestion. But while developing let's keep it like that. At some point, we might want to deploy this feature to prod behind a feature flag so that will make it easier to test

Yes, kept it simple for now, you're the UI guy :) Also, I didn't really notice that videos on the right were supposed to be "more like the current one". I think it could make sense to keep both exploratory videos on the right and related videos below the one you've just visioned, but maybe I'm wrong.

Devops

To release it to prod I will need to update the Meillisearch engine. We probably need to wait a bit as it seems pretty new and they highly recommend waiting. This being said whatever result we get out of it will always be better than a random suggestion.

Not sure how you deploy the meilisearch container / process on the server, but yes since there is nothing in meilisearch that can't be reindexed from sqlite, I think it's ok to deploy alpha/beta software in that case.

Kerollmops · 2023-06-26T15:37:24Z

Hey @crohr 👋

Any reason why this line calls settings.get_attributes instead of get_attributes (get_attributes does have the _vectors key)?

Sorry for the delay. I will summon @brunoocasali on this one. I would expect this to work as for the Ruby integration, _vectors should look the same as any other field.

Note that, currently, Meilisearch isn't particularly smart when only the title or the description is updated it will reindex the document entirely, the vectors too! I am currently working on something that could help in this regard...

Kerollmops · 2023-09-04T13:38:00Z

Hey @crohr 👋 Do you plan to release this recommendation system? We improved the Vector Store solution since then 🧇

adrienpoly · 2023-09-04T16:05:00Z

@Kerollmops thanks for the update I will look into updating the Meilisearch version into the hosting platform. There is an official docker image available now? If I understand correctly I need a 1.3+ version to enable it right?

Kerollmops · 2023-09-05T06:29:13Z

@adrienpoly Indeed, you need a v1.3.x and we provide a Docker image and all sort of binaries ☺️

crohr · 2023-09-05T07:18:57Z

Hey @crohr 👋 Do you plan to release this recommendation system? We improved the Vector Store solution since then 🧇

Sure, I will have another look this week, thanks for the update!

…ription changed

crohr · 2023-09-06T09:00:44Z

app/models/talk.rb

    attribute :title
    attribute :description
    attribute :slug
-    attribute :video_id
+    # ⚠️ This `video_id` attribute makes indexing (silently) fail with v1.3.2 of meilisearch. Error message from meilisearch (GET /tasks):


@Kerollmops I wonder if there is a regression here with v1.3.2, since it fails to index even if I specifically set the primary_key: :id as above.

I recently specified the primary key to index one of my datasets and haven't had issues. Are you sure it was introduced in v1.3.2? Is it related to the ruby SDK? I used the primaryKey query parameter with the POST /indexes/{indexname}/documents route.

hi @crohr why do you need to specify the primary key?

It was just an attempt to fix the issue I'm explaining in the comment, but it doesn't make a difference (and shouldn't since this is the default)

crohr · 2023-09-06T09:03:23Z

app/models/talk.rb

    attribute :video_provider
    attribute :thumbnail_sm
    attribute :thumbnail_md
    attribute :thumbnail_lg
    attribute :speakers do
      speakers.pluck(:name)
    end
+    # ⚠️ This must return nil and not an empty array if no vector is available.
+    # Otherwise all other indexing tasks with non-zero vector arrays will silently fail, since the engine will expect all vectors to have the same length.


@Kerollmops this was by far the most time-consuming issue I had with meilisearch vector indexing: seems like meilisearch rejects any document that doesn't have a vector of the same size as the first one it has seen. If you mistakenly send a zero-sized vector (or incorrectly sized vector) for your first document, then all other documents with a different vector size will be rejected. This should be put in a big warning somewhere, or hopefully have some better error message in the server logs (had to inspect the tasks list to see this error).

Thank you for the report @crohr. I shared your feedback with the team and will see what we can do. We recently fixed a bug with the _vectors field that was producing issues when specifying multiple vectors for a single document. This one will be released in v1.3.3.

crohr · 2023-09-06T09:04:21Z

app/models/talk.rb

+  # Recomputes embedding for all talks that don't have one yet.
+  def self.reembed!(sleep_interval: 2.seconds, limit: nil)
+    # required for querying vectors (not indexing)
+    MeiliSearch::Rails.client.http_patch "/experimental-features", {vectorStore: true}


@Kerollmops could this be auto-enabled in some way if we start to send vectors?

Unfortunately not. You must enable it by hand 😞

crohr · 2023-09-06T09:07:33Z

@adrienpoly ready to be reviewed, I've updated the description with the new task to run

adrienpoly · 2023-10-23T20:44:47Z

thanks @crohr I made a bit of preparation work in #64 to integrate on the front end part the results of the suggestions and to isolate then into a frame so that initial page load for the Talk#show route is not coupled to this suggestion method.

My next step is to upgrade Meilisearch in the prod environment and it is not as plug and play as I hope it would be... Anyway it is not neither a very high traffic site so If search is down for a little time that should be ok 😄

Will try to look back into this soon. Thanks for your work

Store vector embeddings for talks

4d99696

crohr commented Jun 20, 2023

View reviewed changes

app/models/talk.rb Show resolved Hide resolved

Working!

5807118

crohr changed the title ~~Store vector embeddings for talks~~ Store vector embeddings for talks, and display related talks Jun 20, 2023

crohr marked this pull request as ready for review June 20, 2023 18:58

Kerollmops reviewed Jun 20, 2023

View reviewed changes

app/models/talk.rb Outdated Show resolved Hide resolved

adrienpoly reviewed Jun 21, 2023

View reviewed changes

config/initializers/meilisearch.rb Show resolved Hide resolved

adrienpoly mentioned this pull request Jun 21, 2023

Black bands on videos cards #20

Closed

Attempt to not save _vectors if title or description hasn't changed

5f4d277

crohr added 2 commits September 6, 2023 07:24

Merge remote-tracking branch 'upstream/main' into feature/similar-videos

cde60c2

Store embedding in SQLite to avoid recomputation unless title or desc…

83197d1

…ription changed

crohr commented Sep 6, 2023

View reviewed changes

This was referenced Oct 18, 2023

Litesearch POC #58

Closed

Recommended talks #64

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store vector embeddings for talks, and display related talks #19

Store vector embeddings for talks, and display related talks #19

crohr commented Jun 20, 2023 •

edited

Loading

adrienpoly commented Jun 21, 2023

crohr commented Jun 21, 2023 •

edited

Loading

crohr commented Jun 21, 2023

Embeddings

UI

Devops

Kerollmops commented Jun 26, 2023

Kerollmops commented Sep 4, 2023

adrienpoly commented Sep 4, 2023

Kerollmops commented Sep 5, 2023

crohr commented Sep 5, 2023

crohr Sep 6, 2023

Kerollmops Sep 6, 2023

brunoocasali Sep 6, 2023

crohr Sep 6, 2023

crohr Sep 6, 2023 •

edited

Loading

Kerollmops Sep 6, 2023 •

edited

Loading

crohr Sep 6, 2023

Kerollmops Sep 6, 2023

crohr commented Sep 6, 2023

adrienpoly commented Oct 23, 2023

Store vector embeddings for talks, and display related talks #19

Are you sure you want to change the base?

Store vector embeddings for talks, and display related talks #19

Conversation

crohr commented Jun 20, 2023 • edited Loading

adrienpoly commented Jun 21, 2023

Embeddings

UI

Devops

crohr commented Jun 21, 2023 • edited Loading

crohr commented Jun 21, 2023

Embeddings

UI

Devops

Kerollmops commented Jun 26, 2023

Kerollmops commented Sep 4, 2023

adrienpoly commented Sep 4, 2023

Kerollmops commented Sep 5, 2023

crohr commented Sep 5, 2023

crohr Sep 6, 2023

Choose a reason for hiding this comment

Kerollmops Sep 6, 2023

Choose a reason for hiding this comment

brunoocasali Sep 6, 2023

Choose a reason for hiding this comment

crohr Sep 6, 2023

Choose a reason for hiding this comment

crohr Sep 6, 2023 • edited Loading

Choose a reason for hiding this comment

Kerollmops Sep 6, 2023 • edited Loading

Choose a reason for hiding this comment

crohr Sep 6, 2023

Choose a reason for hiding this comment

Kerollmops Sep 6, 2023

Choose a reason for hiding this comment

crohr commented Sep 6, 2023

adrienpoly commented Oct 23, 2023

crohr commented Jun 20, 2023 •

edited

Loading

crohr commented Jun 21, 2023 •

edited

Loading

crohr Sep 6, 2023 •

edited

Loading

Kerollmops Sep 6, 2023 •

edited

Loading