Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Litesearch POC #58

Closed
wants to merge 1 commit into from
Closed

Litesearch POC #58

wants to merge 1 commit into from

Conversation

adrienpoly
Copy link
Owner

This is mostly a POC at this time to test the Litesearch capabilities.

It is very straight forward to integrate with Active record. The ability to set a weight. Being able to replace Meilisearch here would be nice to have something easier to install/deploy. Maybe we can keep the vector based recommendations from #19

Current limitation is that I cannot search on speakers name as the through associations are not yet supported oldmoe/litestack#45

@adrienpoly adrienpoly changed the base branch from main to litedb-adapter October 18, 2023 00:06
@useattractor

This comment was marked as outdated.

@adrienpoly adrienpoly marked this pull request as draft October 19, 2023 20:24
Base automatically changed from litedb-adapter to main October 22, 2023 15:35
@oldmoe
Copy link

oldmoe commented Oct 26, 2023

Regarding similarity search, instead of similarity matching, and since Litesearch sorts by rank by default, did you think of trying out extracting the most significant words from the current video title & description and then doing an OR search with them? the resulting set would be sorted by those closest to the search query. The trick here would be to manually get rid of what could be considered stop words (currently Litesearch has no facility for doing so)

@adrienpoly
Copy link
Owner Author

Regarding similarity search, instead of similarity matching, and since Litesearch sorts by rank by default, did you think of trying out extracting the most significant words from the current video title & description and then doing an OR search with them? the resulting set would be sorted by those closest to the search query. The trick here would be to manually get rid of what could be considered stop words (currently Litesearch has no facility for doing so)

yeah, I thought about that but going that route I feel, I ll be re inventing a search engine. This is where the combo Sqlite Meilisearch was interesting as Meilisearch brings all of this already. The pain point I have with Meilisearch is the upgrades are not really easy. I ll see if a simple Litesearch is good enough especially once I have some tags filters available

@oldmoe
Copy link

oldmoe commented Oct 28, 2023

I can try to hide much of the complexity and offer a model#similar method on AR objects, could be a nice abstraction.

@oldmoe
Copy link

oldmoe commented Nov 2, 2023

Litesearch now has a similar method on the index, and on any AR or Sequel model object
you can do something like video.similar(limit) to get a list of similar videos ordered by similarity, The limit defaults to 10 entries if not supplied. This is not in the released gem yet, I would love to see if the (admittedly naïve) approach is useful on actual data

@adrienpoly
Copy link
Owner Author

adrienpoly commented Nov 2, 2023

@oldmoe great given the underlining job in #64. It will be easy to test it in real life. Will look at it soon hopefully

This comment was marked as outdated.

@adrienpoly
Copy link
Owner Author

@oldmoe I tried to run it out of master branch but I am getting this error

gems/litestack-5d383d83c767/lib/litestack/litedb.rb:131:in `initialize': no such table: talks_search_idx_row (SQLite3::SQLException)

I tried to run in console

 Talk.rebuild_index!

but it returns the same error

if I rollback to the latest official release, litesearch works ok (but no similarity search)

@adrienpoly
Copy link
Owner Author

Ok I made some progress I had to go back to the previous version drop the index then switch back to master and rebuild the index

now I am getting this error
CleanShot 2023-11-02 at 22 39 35@2x

@oldmoe
Copy link

oldmoe commented Nov 4, 2023

Thanks for trying it out, turns out this is due to the tokenizer being a trigram one, I am looking into how to avoid tokens that would cause syntax errors, could you please send me the data for the particular object you are testing?

@oldmoe
Copy link

oldmoe commented Nov 4, 2023

I have just pushed a change that would fix the issue, but I am not sure of the quality of the similarity search using the terms stored in the trigram tokenized index, a porter or unicode tokenizer will yield much better similarity results. I think I will need to reconsider how similarity is implemented for trigram indexes specifically

@adrienpoly
Copy link
Owner Author

The data can be found in /data https://github.com/adrienpoly/rubyvideo/tree/main/data

it is all the videos.yml file that are indexed by the Talk model

@adrienpoly
Copy link
Owner Author

If you run this branch a simple rails db:create db:seed and bin/dev should get you up and running

then you can update the related_talks method to use lite search similar

@adrienpoly
Copy link
Owner Author

closing for now

@adrienpoly adrienpoly closed this Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants