-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orleans Indexing and Lucene.Net #4
base: master
Are you sure you want to change the base?
Conversation
public async Task GrainTest()
{
var grain = new IndexGrain();
await grain.OnActivateAsync();
int count = 0;
int foundCont = 0;
await Task.WhenAll(Task.Run(async () =>
{
for (int i = 0; i < 150; i++)
{
var doc = new GrainDocument(i.ToString());
doc.LuceneDocument.Add(new StringField("property",$"i={i}", Field.Store.YES));
await grain.WriteIndex(doc);
count++;
}
}),
Task.Run( async () =>
{
await Task.Delay(1000);
for (int i = 0; i < 300; i++)
{
var doc = await grain.QueryByField("property",$"i={i}");
count++;
if (doc.TotalHits > 0)
{
foundCont += 1;
}
}
}));
await grain.OnDeactivateAsync();
count.Should().Be(450);
foundCont.Should().Be(150);
} In this test, of course, I create indexes in Lucene.Net, which is not convenient. |
I tried to implement Lucene full text search based on Orleans and cloud storage providers and I kind of failed. The problem I faced were around performance:
If you wanna achieve high performance and stability it is very challenging, especially because Orleans Applications are deployed much more often than a database. If you can achieve that, it would be great, but I have lost data from time to time and therefore decided to go with Elastic or a database full text system. |
@SebastianStehle Can I ask you to disclose the details of your implementation? |
Hi, my implementation is removed now but ít is Open Source: https://github.com/Squidex/squidex/tree/8e088beb1c91626d1f67ec8a09f2b80740639054/backend/src/Squidex.Domain.Apps.Entities/Contents/Text/Lucene
I think the most important class is this one: https://github.com/Squidex/squidex/blob/8e088beb1c91626d1f67ec8a09f2b80740639054/backend/src/Squidex.Domain.Apps.Entities/Contents/Text/Lucene/IndexManager.cs It manages the indexes in case a grain gets deactivated and the index is not committed. |
@KSemenenko Conceptually, I don't see a problem with the idea. My intuition is more aligned with @SebastianStehle though. For a limited scale and load, holding and updating indices in memory will probably work. But in a production setting I'd be nervous about the lack of separation of concerns and sharing memory/CPU resource with Lucene.Net in the same process. For production use, I'd look at offloading indexing to something like Elastic or at least hosting Lucene.Net indexing code in a separate process. Disclamer: I've never user Lucene.Net. My thoughts here are pure intuitive speculations FWIW. |
That's an interesting thought @sergeybykov An interface for writing data, and interface for Iqueralable to make queries. |
Although, for example, I keep silo in a kubernetes cluster and I have no problem adding a couple of virtual machines. |
Yes, an interface with pluggable implementations would be the way to go. |
When you talk about indexes you have basically 2 options:
Lucene and databases use the second approach because the goal is to work with large data sets. I thought the goal of this project is to work on the Key-Value stores and follow the first approach. If we use the database for queries, why do we need Orleans Indexing at all? It would be far easier and more efficient to use stored states directly, perhaps with a mapping function for indexes? https://github.com/sebastienros/yessql/wiki/Tutorial#creating-mapped-index Another index has the big problem that it can be out of sync with the original data, especially if you do not use transactions. |
To give you an example, I have thousands of users who all have their geo position. I see indexing as a convenient abstraction over storage\database. And a fairly powerful search system. yesterday I thought it would be cool to have the Grein itself take care of the index updates. For example, we'll write a post handler. Which will write variables marked with an attribute to the index when the grain method finished. we can generate something like INotifyPorperyChaned and watch for changes of variables. Or something like that. Well, in general, it's as abstract as the state of grain. But only for indexing. |
You are talking about an abstracting to a custom Grain. Then I am on your side ;) |
Anyway, I've been waiting for this functionality for years, and I've thought a lot about it.
After studying the original documents and the code, I came to the conclusion that it would be better to use #3 Lucene.Net for indexing. For example, the same ElasticSearch does this. With Grain, it is easy to do index clustering.
On GitHub there is code to support LINQ queries, or code for storing files in Azure Storage.
I think that it is possible to use a special Grain which will keep track of a certain type of grain and index the necessary fields.
I can use a service, and thus have indexes on each Silo.
So as soon as I had time I built some prototype.
What do you think about Lucene.Net? @ReubenBond @sergeybykov @philbe
If you like this idea I can go ahead and make this code stable.