Orleans Indexing and Lucene.Net #4

KSemenenko · 2021-09-05T20:28:54Z

Anyway, I've been waiting for this functionality for years, and I've thought a lot about it.
After studying the original documents and the code, I came to the conclusion that it would be better to use #3 Lucene.Net for indexing. For example, the same ElasticSearch does this. With Grain, it is easy to do index clustering.
On GitHub there is code to support LINQ queries, or code for storing files in Azure Storage.
I think that it is possible to use a special Grain which will keep track of a certain type of grain and index the necessary fields.
I can use a service, and thus have indexes on each Silo.

So as soon as I had time I built some prototype.

What do you think about Lucene.Net? @ReubenBond @sergeybykov @philbe
If you like this idea I can go ahead and make this code stable.

KSemenenko · 2021-09-05T20:33:14Z

public async Task GrainTest()
{
    var grain = new IndexGrain();

    await grain.OnActivateAsync();

    int count = 0;
    int foundCont = 0;

    await Task.WhenAll(Task.Run(async () =>
    {
        for (int i = 0; i < 150; i++)
        {
            var doc = new GrainDocument(i.ToString());
            doc.LuceneDocument.Add(new StringField("property",$"i={i}", Field.Store.YES));
            await grain.WriteIndex(doc);
            count++;
        }
    }), 
    Task.Run( async () =>
    {
        await Task.Delay(1000);
        for (int i = 0; i < 300; i++)
        {
            var doc = await grain.QueryByField("property",$"i={i}");
            count++;

            if (doc.TotalHits > 0)
            {
                foundCont += 1;
            }

        }
    }));

    await grain.OnDeactivateAsync();

    count.Should().Be(450);
    foundCont.Should().Be(150);

}

In this test, of course, I create indexes in Lucene.Net, which is not convenient.
Of course for all this you should write wrapper methods. and for queries add LINQ.
and we'll have even better than ElasticSearch

SebastianStehle · 2021-09-05T21:48:41Z

I tried to implement Lucene full text search based on Orleans and cloud storage providers and I kind of failed. The problem I faced were around performance:

You need a central storage for your Lucene indexes. You can implement the Index Directory using Azure Blob Storage or so but it is relatively slow. In my experience it was much faster to periodically make a backup of the snapshots, by putting them in an archive and then send it over. In combination with a remove disk that works as a backup, the write is not safe.
Lucene is not build for commits of each document. If you wanna have high performance you need to make the changes in batches.

If you wanna achieve high performance and stability it is very challenging, especially because Orleans Applications are deployed much more often than a database. If you can achieve that, it would be great, but I have lost data from time to time and therefore decided to go with Elastic or a database full text system.

KSemenenko · 2021-09-06T15:05:49Z

@SebastianStehle Can I ask you to disclose the details of your implementation?
did you store/load data in memory on the activation and deactivation of Grain?
did you have only 1 index, or did you use MuliIndex?

SebastianStehle · 2021-09-06T16:19:30Z

Hi, my implementation is removed now but ít is Open Source: https://github.com/Squidex/squidex/tree/8e088beb1c91626d1f67ec8a09f2b80740639054/backend/src/Squidex.Domain.Apps.Entities/Contents/Text/Lucene

I had multiple indexes, one grain per index.
The index was loaded from a central store like S3 to a local folder on activation.

I think the most important class is this one: https://github.com/Squidex/squidex/blob/8e088beb1c91626d1f67ec8a09f2b80740639054/backend/src/Squidex.Domain.Apps.Entities/Contents/Text/Lucene/IndexManager.cs

It manages the indexes in case a grain gets deactivated and the index is not committed.

sergeybykov · 2021-09-11T20:40:26Z

@KSemenenko Conceptually, I don't see a problem with the idea. My intuition is more aligned with @SebastianStehle though. For a limited scale and load, holding and updating indices in memory will probably work. But in a production setting I'd be nervous about the lack of separation of concerns and sharing memory/CPU resource with Lucene.Net in the same process. For production use, I'd look at offloading indexing to something like Elastic or at least hosting Lucene.Net indexing code in a separate process.

Disclamer: I've never user Lucene.Net. My thoughts here are pure intuitive speculations FWIW.

KSemenenko · 2021-09-11T20:54:43Z

That's an interesting thought @sergeybykov
Maybe then we need some kind of abstraction like you did for storing states.

An interface for writing data, and interface for Iqueralable to make queries.
And then do a basic implementation in memory, for example on List storage.
And then do interface implementations for redis, cosmosdb and other databases?

KSemenenko · 2021-09-11T20:57:56Z

Although, for example, I keep silo in a kubernetes cluster and I have no problem adding a couple of virtual machines.
Right now I use cosmosdb to store the index. I don't really like this solution. And I still wanted to make solutions with indexes.

sergeybykov · 2021-09-11T21:29:17Z

Yes, an interface with pluggable implementations would be the way to go.

SebastianStehle · 2021-09-12T14:09:53Z

When you talk about indexes you have basically 2 options:

Do everything in memory and use things like Dictionary or SortedDictionaries in C#.
Try to find a solution that also work great when the majority of the data is still on the disk, e.g. B+Trees or inverted indexes.

Lucene and databases use the second approach because the goal is to work with large data sets.

I thought the goal of this project is to work on the Key-Value stores and follow the first approach. If we use the database for queries, why do we need Orleans Indexing at all? It would be far easier and more efficient to use stored states directly, perhaps with a mapping function for indexes? https://github.com/sebastienros/yessql/wiki/Tutorial#creating-mapped-index

Another index has the big problem that it can be out of sync with the original data, especially if you do not use transactions.

KSemenenko · 2021-09-12T14:23:25Z

To give you an example, I have thousands of users who all have their geo position.
All communication with users is through grain, because it is the only source of up-to-date data.
I have the user's location in the database, but it is like a storage between activations of grain.
So I want to find all the users in the area. And get their grain id.
Now I have a table in cosmosdb in which I store geoposition and Grain Id.
Now every time the geoposition changes, I have to update the table in the datadatabase.

I see indexing as a convenient abstraction over storage\database. And a fairly powerful search system. yesterday I thought it would be cool to have the Grein itself take care of the index updates. For example, we'll write a post handler. Which will write variables marked with an attribute to the index when the grain method finished.

we can generate something like INotifyPorperyChaned and watch for changes of variables. Or something like that.

Well, in general, it's as abstract as the state of grain. But only for indexing.

SebastianStehle · 2021-09-13T07:09:56Z

You are talking about an abstracting to a custom Grain. Then I am on your side ;)

KSemenenko added 5 commits April 6, 2021 10:58

update nuget

add4199

Luciene

3095207

GRain

6bfbd7a

tests

ac74e7f

InMemory

7418e01

KSemenenko changed the title ~~Orleans Indexing and Luciene~~ Orleans Indexing and Lucene.Net Sep 5, 2021

SebastianStehle mentioned this pull request Apr 26, 2022

Lucene.net + GrainServices #3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orleans Indexing and Lucene.Net #4

Orleans Indexing and Lucene.Net #4

KSemenenko commented Sep 5, 2021 •

edited

Loading

KSemenenko commented Sep 5, 2021 •

edited

Loading

SebastianStehle commented Sep 5, 2021

KSemenenko commented Sep 6, 2021

SebastianStehle commented Sep 6, 2021

sergeybykov commented Sep 11, 2021 •

edited

Loading

KSemenenko commented Sep 11, 2021

KSemenenko commented Sep 11, 2021

sergeybykov commented Sep 11, 2021

SebastianStehle commented Sep 12, 2021

KSemenenko commented Sep 12, 2021

SebastianStehle commented Sep 13, 2021

Orleans Indexing and Lucene.Net #4

Are you sure you want to change the base?

Orleans Indexing and Lucene.Net #4

Conversation

KSemenenko commented Sep 5, 2021 • edited Loading

KSemenenko commented Sep 5, 2021 • edited Loading

SebastianStehle commented Sep 5, 2021

KSemenenko commented Sep 6, 2021

SebastianStehle commented Sep 6, 2021

sergeybykov commented Sep 11, 2021 • edited Loading

KSemenenko commented Sep 11, 2021

KSemenenko commented Sep 11, 2021

sergeybykov commented Sep 11, 2021

SebastianStehle commented Sep 12, 2021

KSemenenko commented Sep 12, 2021

SebastianStehle commented Sep 13, 2021

KSemenenko commented Sep 5, 2021 •

edited

Loading

KSemenenko commented Sep 5, 2021 •

edited

Loading

sergeybykov commented Sep 11, 2021 •

edited

Loading