Skip to content

Feature: The ability to fold historical events into one another (to keep the Event Store size managable) #179

@ReinderReinders

Description

@ReinderReinders

This one is perhaps not a direct feature request but more of a concept I am thinking about and I thought I might share it here. Perhaps it doesn't really fit into the design and philosophy of Eventuous, but I'm curious what others might think about it.

I am currently building a multi-tenant cloud product that will use an Event Store (PostgreSQL) as one of the components within the system. Multiple applications will have subscriptions to this Event Store (and will also each push Aggregates). The requirements demand that the Event Store is a historical system, since other applications might be added to the product later and these will need to be able to subscribe to the Event Store and build a full read model (including Aggregates that have not been updated in years). This means that I cannot use the 'Archive' feature as it is described in this section of the documentation, as the documentation warns against it:
image
To take the use case from the documentation as an illustration: certain products might not care about the history (the Booking system is not interested in Bookings from five years ago) but for my product the history is a requirement. But perhaps not the entire history (see below for my idea).
My concern is that, especially because the product will be multi-tenant, the Event Store will grow to a huge size and this will impact the performance (especially in the case described above where a new application is added and it will have to catch-up the entire history. That could eventually stretch into an import lasting days). We are already discussing mitigating strategies, such as for instance having a separate Event Store for each tenant (which is a good idea in any case), but even then I am expecting a great number of events for each tenant. I might further subdivide the data for a single tenant into multiple Event Stores (for instance, splitting the Aggregates 50-50 between two stores) but this would be difficult to configure/implement later on if the system was already taken into Production, and it would be hard to predict beforehand which domain Aggregates are likely to receive the most events (which would make for instance a 50-50 split basically a shot in the dark). And even with all mitigating strategies in place (1 Event Store for 1 tenant with only 1 of several domain sections from an application), the nature of a historical system means that twenty years from now, there will still be a huge history.
However, my requirements only demand that a new application must rebuild the current state of the Aggregates; each individual event from five years ago (configurable, of course) is not interesting, only the result (current state). So I am thinking about a concept of 'folding' old events into a single event that contains only the end result at that time.
An example (using Create, Update and Delete events and Entities instead of Aggregates since I don't (yet) use true DDD in the product):

jan 1st, 2020: Entity 1 Created (Name=Demo)
sep 1st, 2021: Entity 1 Updated (Description=Later Update)
jan 1st, 2022: Entity 1 Deleted

Execute the 'fold' over the Event Store with jan 1st, 2022 (midnight) as the parameter provided (i.e. all history before that date may be folded).
Result:

jan 1st, 2022: Entity 1 Created (Name=Demo, Description=Later Update, HistoryFolded=jan 1st, 2022)
jan 1st, 2022 (but later; timestamp excluded for brevity): Entity 1 Deleted

The HistoryFolded field (or something like that) would tell a consuming application that no historical events are known from before jan 1st, 2022. This would be enough for the needs of my product.
The reason I want to retain a Created event is twofold: one, I need a place for the HistoryFolded field, and two, one of my consuming applications is interested in retaining some fields even for deleted Entities (so the user might for instance be shown a view with deleted entries: "Entity 1, Named "Demo", was deleted on feb 1st, 2022." ). In other words, my consuming applications might still be interested to know that there once existed an Entity named Demo, but it has since been deleted.

I could implement something like this by creating an application that reads and folds events from Store 1 and write my 'folded' Entities to Store 2, but this would still require downtime (in order to switch all applications over from using Store 1 to Store 2; and you would technically have to turn the entire system off during the operation in order to avoid missing new events that were written to Store 1 after a 'fold' has already been executed. i.e. no application can append events to Store 1 while a fold is occurring). This would not be my chosen solution.

Could Eventuous possibly support something like a folding feature, or have I just pitched one of the cardinals sins of Event Sourcing (I am still learning the concept, and have not read all there is to read about the topic)?
One of the concerns I could identify would be, what would happen to a subscription that is currently (re)building a view model while a stream is being folded? You can't really shut down subscriptions at runtime. So you can't really get around the issue of requiring downtime.

I'm curious to see what others might think about this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions