Elastic Search Integration #4883
Replies: 7 comments 10 replies
-
For common use cases
For the Input interface
For the output
For the C# library
|
Beta Was this translation helpful? Give feedback.
-
My thoughts: Use casesI thought I would start with our anticipated use cases. One of the reasons we want to integrate with elasticsearch (ES) is because we want to combine search results from ES with authoritative data from Sql Server. A concrete example would be "Find records that contain the string "app" on any one of 3 fields that are also assigned to the current user." To further complicate matters, we need to be able to "disabled" certain operators or fallback to Sql Server in the event our near-real-time search index becomes not-so-near-real-time. We would like to be able to hide the complexities of switching back and forth between data sources without the consumer knowing any difference. I am still new to the internals, so forgive me if I am way off in the following: InputHaving a default operators set such as OutputEnriching of results with ES metadata is exactly what we would be looking for. Match document count, aggregates, score, highlighting, etc. Being able to define what these are up front in our schema is valuable in our case because we don't want to tie each of these metadata items to ES.
Other considerations:
|
Beta Was this translation helpful? Give feedback.
-
I think we have a general idea on how the integration should look like from a API and conceptual perspective. The following will be addressed in the first iteration:
Future iterations will include (rough draft):
Input & OutputIn the first iteration the query interface will look like the usual data interface (plus the "hits" and "score): {
users(where: {name: { eq: “Sam” }} ) {
hits {
total
}
edges {
node {
name
}
cursor
score
}
}
} This will be translated to an elastic search query like the following: {
"query": {
"bool": {
"must": [
{
"match": {
“name": “Sam”
}
}
]
}
}
} DriverWhat driver we use in the end is up for discussion. We will only really see just before or during the implementation what will fit better. C# APIThe next step is to define how the C# interface should look like for types and resolvers. TypesData types are a different version of InputObjectTypes. A filter type can contain operations (like
ResolversOnce we have define how we define the filter types, we have to define how we want to expose the filtering capabilities to the user. In the database implementations we usually return a With elastic seach we have a three phase query:
In c# code this could look something like this. public class Query
{
[UseFiltering<UserFilterInputType>]
[UsePagination]
public Task<User[]> GetUsersAsync(
[Service] IElasticSearchClient es, // NEST or ElasticSearch.NET or both
[Service] UserService service,
IResolverContext context,
CancellationToken ct) // we need this to access the inputs, we could also have something more explicit like SearchContext
{
// 1 Phase is already execute by the middleware pipeline
// 2 Phase
var searchResult = await es.SearchAsync(context, ct);
// 3 Phase
return service.Search(searchResult) // you need to provide some examples on how you do this currently
}
} The open questions where are:
How do you query now. C# code examples |
Beta Was this translation helpful? Give feedback.
-
Our ExperienceWe have a setup currently using ES with HotChocolate. While we talked early on about trying to setup an IQueryable of our own to help split logic between search and actual data return, we ended up with a sort of hybrid setup. We actually load far more of our data into Elastic than just the searchable fields -- enough to populate one of the "core" objects on our graph. From there, we use stitching to fill the gaps. It gives us a sort of best-of-both-worlds using the current setup. Just waiting for v13 and some of the stitching/federation improvements :) Schema DefinitionTo define our ES schema we use strings. Initially this meant defining it as POCOs several times, since we had the definition for ES, the definition for our database, and the definition for HC. We've since shifted to a schema-first approach, as having to hydrate all those objects all over the place just isn't efficient, especially when you don't need the full object. For many operations, we don't actually hydrate an object until just before it goes out to the database. We've actually shifted completely away from .NET for populating the ES index just because of performance. Driver ConcernsWhen you're looking at the driver, beware of NEST. Newer versions are no longer compatible with AWS/OpenSearch, and I suspect many users lean in that direction. It's an interesting time in this space, and it may not be good to hitch the wagons to one particular horse (if it can be avoided). What we'd like to seeI love the idea of plugging in filters, pagination, and sorting, as we had to roll our own for each of these. While we do like a clean filter approach as mentioned above, having some help getting from there to the ES syntax would be fantastic. Maybe a way to specify filter options that are allowed? This seems especially relevant in a world where the schema in ES differs from the database schema. With more teams adopting NoSQL databases, this is especially relevant given there can be more than one way to retrieve things, multiple representations of the data, or even related data that isn't related in an ORM sense. |
Beta Was this translation helpful? Give feedback.
-
So the whole OpenSearch story is a bit unfortunate. I also see that there is a new Package for elastic search 8 that is currently in Alpha. It's difficult to find a "one-fits-all" client at the moment. For filtering we have to define a datastructure that we build during visitation of the input object. This can either be a custom data structure that we then apply to the driver, or we use the drivers data structure like Query. e.g. Filtering = Expressions, MongoDb = BsonDocument. It would be nice if we have interop between NEST and the low-level client. Is something like this possible? Can you build parts of the query with nest and then "merge" it with the low-level client definition? I imagine something like this could be possible: public class BookFilterInputType : FilterInputType
{
protected override void Configure(IFilterInputTypeDescriptor<Book> descriptor)
{
descriptor.Field(x => x.Title); // this adds all data operations for title (eq, neq, in, nin etc.)
descriptor.Field(x => x.Description, descriptor =>
{
descriptor.Operation(ElasticOperation.Search).Name("search").Type<StringType>(); // e.g. wildcard search
descriptor.Query(
"custom",
(Query q, String v) => q.Match(m => m.Field("TextID").Query(v) ))); // custom operation with nest interopt
});
}
} Example query: {
books(where: {title: {eq: "123"}, description: {search: "something", custom:"this will be in the v variable" } } ) {
title
} Querstions: What client are the OpenSearch people using? |
Beta Was this translation helpful? Give feedback.
-
Thanks to @A360JMaxxgamer things are moving again: #4998 |
Beta Was this translation helpful? Give feedback.
-
Hello. I was wondering if there is plan to get this PR moving? Thank you. |
Beta Was this translation helpful? Give feedback.
-
Elastic Search Integration
We are looking into the development of an elastic search integration for HotChocolate.
The first step is to define what the integration should look like and in what direction we want to go.
Elastic Search has a broad application range, we should focus on one thing and plan this out.
Common Elastic Search use cases
Elastic Search as a Search Index
Probably the most common use case is to use ElasticSearch solely for indexing data for search and use a database as the primary storage of the data.
Elastic Search aș a Database
Elastic APM uses elastic search as it’s primary storage. This seems to be a more specialised use case.
We should focus in a first iteration to make the search index use case work.
GraphQL API
We have to define an interface for query input and query output. This should be the first step.
Input
We can integrate elastic search into the filtering API, and are theoretically not bound to the current format. The search API is super powerful, the question is how much of this do we want to expose to the user.
A simple HotChocolate Data query looks like this:
Directly translated to elastic search this would be something like
But this is probably not what elastic search users would like to use. There many ways to query elastic search that are not available in conventional databases.
As an example have a look at this query:
How do we expose this in a GraphQL API.
The simplest approach would be to just open up the query API:
But this exposes a big attack surface for e.g. denial of service attacks.
It’s is also not nice, as the front-end developer would have to make the decision about what the best way to query the elastic search is.
It would be much nicer if the possible searches were defined in the backend and expose a saner query to the front-end.
This is a useful resource to find sample queries for elastic: 42 Example Queries
Output
Like the input we must define how we want to expose the filtered data.
The easiest representation would be just as the resolved objects from the database.
But it could also be interesting to expose additional metadata.
An elastic search response looks like this:
Most of the properties are too low level to expose. But it could be interesting to expose parts of “hits” on a connection and maybe even the score on the edges:
Inspirations
Open Questions / Next Steps
Future Steps
Beta Was this translation helpful? Give feedback.
All reactions