Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/ogc 508 replace elastic search by postgres text search #918

Open
wants to merge 207 commits into
base: master
Choose a base branch
from

Conversation

Tschuppi81
Copy link
Contributor

INITIAL FEEDBACK TO TEXT SEARCH IN POSTGRES - WORK IN PROGRESS

Feature/ogc 508 replace elastic search by postgres text search

core: Replace elastic search by postgres full text search

With this changes elastic search (es) is still in place and productive while we introduce the postgres full text search accessible using 'search_postgres' keyword in the url.
es url: http://localhost:8080/onegov_agency/bs/search?q=Arnold
psql url: http://localhost:8080/onegov_agency/bs/search_postgres?q=Arnold

hint: onegov-core upgrade

Main changes:

  • Re-indexing
    ../onegov-cloud/src/onegov/search/cli.py
  • Upgrade script (uses the same code as re-indexing)
    ../onegov-cloud/src/onegov/search/upgrade.py
  • Extension of mixin class Searchable
    .. /onegov-cloud/src/onegov/search/mixins.py
  • Adjusting all searchable models adding the fts index in postgres

Open Items:

  • unit tests are not reworked yet
  • language configuration
  • improve ranking

Checklist

  • I have performed a self-review of my code
  • I considered adding a reviewer
  • I have added an upgrade hint such as data migration commands to be run
  • I made changes/features for both org and town6
  • I have tested my code thoroughly by hand
    • I have tested database upgrades
  • I have added tests for my changes/features

Tschuppi81 and others added 30 commits April 17, 2023 05:12
Enables bugbear in pre-commit and CI linting, also introduces a garbage collector friendly LRU cache variant.

TYPE: Feature
LINK: OGC-1052
I also added a db upgrade job to add the columns to the two tables. In addition it will parse the people.address column and separate the information in the newly added fields.

TYPE: Feature
LINK: ogc-966
HINT: Run onegov-people --select /onegov_agency/* migrate-people-address-field --dry-run after upgrade
TYPE: Bugfix
TYPE: Feature
LINK: PRO-1173
The attendee receives a notification on registration or cancellation of their participation.

TYPE: Feature
LINK: PRO-1126
TYPE: Feature
LINK: OGC-746
Fix root-level page interpretation bug for news, 
which was mistakenly being treated as falsy (index 0). 

TYPE: Bugfix
LINK: OGC-863
TYPE: Feature
LINK: PRO-1126
TYPE: Bugfix
LINK: OGC-1073
Slightly larger page text
Version number in footer
Hover effect on Navigation

TYPE: Feature
@msom msom removed their request for review August 14, 2023 09:15
Copy link
Member

@Daverball Daverball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably break a lot of things (anything that uses directory_id in a query) and also does not do what you want it to do. The reason this previously failed for search indexing is simply because to_tsvector is not implemented for UUID so it needs to first be converted to TEXT, however, I'm not sure if the format is the same between Postgres and Python (i.e. if there will be dashes or not in the resulting string if you do a cast).

Type coercion is going to be tricky in general and doing it automatically might not be possible, although you may be able to do something with CASE and pg_typeof, instead of the the coalesce you've been doing to handle NULL.

to_tsvector definitely works for strings and JSON, although in the case of JSON i'm not sure what will end up in the search index, if it's just the values or also the keys. I don't remember if it works for array and hash, but you will need to figure out a way to make all the various cases work correctly.

Also removing all those hybrid_property is also not what you want to do to make this work, but you will need to figure out a way to generate an equivalent SQL expression for what the hybrid_property returns, so it can end up in the search index. In some cases this might not be possible and we may need a computed column to calculate the value in-app and store this value in the database.

Another route would be to forego the automatically updated index and instead generate and update it in-app by subscribing to the relevant orm events, as we have been doing previously for elastic search. This would mean that in-app we pull out all the string values that should end up in the index and then update the tsvector column based on those values.

@Tschuppi81
Copy link
Contributor Author

@Daverball In the above commit I am fighting with more complex hybrid expressions. Most of them gather data from other tables and I could not figure out how to properly join the corresponding tables. Currently the 'reindexing' fails in CourseEvent.
Note: This commit to be reverted - just shows the join problem...

@Daverball
Copy link
Member

Daverball commented Nov 27, 2023

@Tschuppi81 This is going to be tricky in general and probably a bad idea in terms of performance anyways, you are going to be better off adding caches for the joined attributes and using @observes methods to keep the caches up to date.

Or we could give up on calculating the search index online, since we're still quite a ways off and just generate a dictionary offline of all the search values which then can be turned into a TSVECTOR directly if interpreted as JSONB. While somewhat inefficient, this would still be quicker than what we had before with elasticsearch, since we save ourselves from having to talk with the elasticsearch server. It would also get rid of the issues related to polymorphism, since SQLAlchemy is aware of what polymorphic identity we're talking about, while in postgres we would have to write CASE statements to distinguish the various versions in the same table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants