You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we store all timelines in memory, even deleted ones. This is needed to fix a bug with active computes for deleted timelines. Active computes can recreate timeline after deletion, so without this fix timelines can reappear after deletion.
The issue with this fix is that RAM usage grows indefinetely until restart. This can be addresssed with using less memory for deleted timelines and/or GC timelines deleted for a long time (~1 hour).
Fix timeline delete retries
When user deletes a project, safekeepers receive a delete_tenant API call. It iterates over all timelines for this tenant and deletes them individually.
The issue here is that we don't track completed deletions in safekeepers and some timelines can be deleted more than once. This is not good when a tenant has thousands of timelines, because it takes too much time to retry deleting all of them.
#6766 fixes this by adding a flag inside a timeline. The slightly better fix can be to move this flag to timelines_global_map.rs instead, and don't invoke timeline deletion more than once from the tenant deletion code there.
The text was updated successfully, but these errors were encountered:
## Problem
Safekeepers left running for a long time use a lot of memory (up to the
point of OOMing, on small nodes) for deleted timelines, because the
`Timeline` struct is kept alive as a guard against recreating deleted
timelines.
Closes: #6810
## Summary of changes
- Create separate tombstones that just record a ttid and when the
timeline was deleted.
- Add a periodic housekeeping task that cleans up tombstones older than
a hardcoded TTL (24h)
I think this also makes #6766
un-needed, as the tombstone is also checked during deletion.
I considered making the overall timeline map use an enum type containing
active or deleted, but having a separate map of tombstones avoids
bloating that map, so that calls like `get()` can still go straight to a
timeline without having to walk a hashmap that also contains tombstones.
Fix keeping deleted timelines in memory
Currently we store all timelines in memory, even deleted ones. This is needed to fix a bug with active computes for deleted timelines. Active computes can recreate timeline after deletion, so without this fix timelines can reappear after deletion.
The issue with this fix is that RAM usage grows indefinetely until restart. This can be addresssed with using less memory for deleted timelines and/or GC timelines deleted for a long time (~1 hour).
Fix timeline delete retries
When user deletes a project, safekeepers receive a
delete_tenant
API call. It iterates over all timelines for this tenant and deletes them individually.The issue here is that we don't track completed deletions in safekeepers and some timelines can be deleted more than once. This is not good when a tenant has thousands of timelines, because it takes too much time to retry deleting all of them.
#6766 fixes this by adding a flag inside a timeline. The slightly better fix can be to move this flag to
timelines_global_map.rs
instead, and don't invoke timeline deletion more than once from the tenant deletion code there.The text was updated successfully, but these errors were encountered: