-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First draft of 0004-greenstand-search-engine #7
base: main
Are you sure you want to change the base?
Conversation
|
||
* ElasticSearch - can integrate well with other products of the Elastic Stack like Kibana, Logstash. Easiest to experiment with, since there are free trials available for Elastic Cloud (managed ElasticSearch deployment) | ||
|
||
## Considered Options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider sphinx? https://stackshare.io/stackups/lucene-vs-sphinx Great for docs search I know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe something from this list? https://www.educba.com/elasticsearch-alternatives/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the links, @mckornfield! I'll check these out and see if any fit better than ES
|
||
* Steep learning curve? | ||
* Requires more experimentation on what architecture is the best for Greenstand's use case (i.e. search over multiple indexes vs. one index) | ||
* Heavy memory usage (requires 4.0 GB RAM just for ElasticSearch, probably more for Kibana and Logstash) - can be expensive since it requires larger compute servers and this would need to remain on at all times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely costs a lot as far as resources. Also there's no good auth support in the free versions of ELK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ELK, I believe we can set up service accounts to request and use tokens for authorization to pass requests to the Elastic cluster https://www.elastic.co/guide/en/elasticsearch/reference/current/token-authentication-services.html. This don't seem to be limited to Elastic Cloud (which is just a managed-deployment of the ELK stack)
|
||
## Considered Options | ||
|
||
* ElasticSearch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you spike these with a sample dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I took about 20 rows from the public.planters
and public.trees
tables and all the rows from the public.organizations
table in the treetracker
database. I tested autocomplete/search hinting queries on three separate indexes (1 for each table) and on one single index that contained all three types of data rows (planters, trees, organizations).
|
||
## Decision Drivers | ||
|
||
* ElasticSearch - can integrate well with other products of the Elastic Stack like Kibana, Logstash. Easiest to experiment with, since there are free trials available for Elastic Cloud (managed ElasticSearch deployment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we just got rid of our ELK stack, which we were using for consolidated logging of microservices. It was a very difficult to manage for the current cloud team and having it deployed into our cluster. I presume we would not need the whole ELK stack to achieve what you are looking to do here? Kibana really stressed our cloud resources. However, maybe there is a more stripped down deployment option that would meet your use case.
|
||
* ElasticSearch | ||
* Apache Solr | ||
* Apache Lucene |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say more about why the Apache projects were not chosen? I don't have experience with either, but I do know that CKAN (our chose data portal) uses Solr.
@kparikh9 We generally seek to pursue build before buy and self management of our application platform, however it seems like you have a quick solution here that adds some nice value. I am falling at cautious support for this plan, but I'd like to ask that we incorporate into this ADR a little longer range thinking for bringing the search engine into our cloud, without using Kibana in the future. I think if the philosophy at the start of this paragraph and the longer term plan to in-house the solution are both articulated in the ADR, I would be happy to support and accept this decision. |
@kparikh9 sorry for the delay, do you want to also try a bit Solr, I deployed a small node with Solr, it seems pretty interesting: https://dev-k8s.treetracker.org/search/solr/#/mycoll/query?q=publisher_s:*am*&q.op=OR&indent=true |
I think Solr is more suitable for our case, IMO, because
Our main goal here is to do full-text search, search planter info, species, org, and others, (and beable to search crossing fields) also, autocompletion, both Solr and ES can do the job, but Solr is a more dedicated search engine with advanced features (ES is more focused on log analysis I think), as the creator of the ES admits:
Here is another opinion:
I think these two has different focus and use case.
Because our goal is to index all Greenstand content, I think the scale of the data is not super huge, I don't think we need a super scalable, distributed solution which ES is good at, but the cost is the maintenance and complexity.
Solr is more open source than ES. |
CC: @dadiorchen