Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis implementation of nodupe plugin #700

Closed
gcglinton opened this issue Jun 12, 2023 · 5 comments
Closed

Redis implementation of nodupe plugin #700

gcglinton opened this issue Jun 12, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request likely-fixed likely fix is in the repository, success not confirmed yet. NewUseCase needed to address a use case, we can't yet support. v3only Only affects v3 branches. v3 issue deferred to or affects version 3

Comments

@gcglinton
Copy link
Contributor

As we've been talking about for weeks now, there's a need to implement a Redis version of the NoDupe cache, so that SR3 can be made more stateless, and cleanly containerized.

Having now written a full test suite for the disk-based NoDupe cache (#697), I have a better understanding of how it works.

One of main things I'm left wondering, is whether it's a problem if items in the cache expire between on_housekeeping runs. I understand why that's not the model for the disk nodupe, because it would be too complicated/expensive to have a per-item TTL, but in Redis that would be pretty straight forward, as individual keys can have expiries set.

However, if that's not desired, in order to keep the functioning of the two implementations the same, then a little bit of extra work will need to be done in Redis to most efficiently store the data, while making it easy to retrieve and process.

@petersilva
Copy link
Contributor

petersilva commented Jun 14, 2023 via email

@gcglinton
Copy link
Contributor Author

I'd say this way is less work.
The SR3 code is much simpler (404 lines in disk, 221 in Redis), since we don't have to do anything on_housekeeping, on_start, or on_stop.

Whether or not anyone looks up an entry, they expire when the TTL is up. But that doesn't really matter IMO, because a lookup causes it to get re-added anyway.

I could configure it to extend the TTL on lookup if needed, but that's not the way the disk one operates now, so it would be an even larger difference between the two implementations.

@petersilva
Copy link
Contributor

the lack of resolution in expiry was just a concession to efficiency (not checking every time we look any entry up.)
it sounds like the simpler redis implementation will just be plain better.

@gcglinton
Copy link
Contributor Author

I did some quick tests thanks to @reidsunderland's help with a basic config, and as far as I can tell, both the disk and Redis drivers work identically; detecting duplicate files, and rejecting them.

I haven't yet taken the time to write a proper comparative unit test (comparing disk, redis, and the expected value), but I think it should be pretty easy to do.

@petersilva petersilva added enhancement New feature or request likely-fixed likely fix is in the repository, success not confirmed yet. NewUseCase needed to address a use case, we can't yet support. v3 issue deferred to or affects version 3 v3only Only affects v3 branches. labels Jun 21, 2023
@petersilva
Copy link
Contributor

part of 3.0.41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request likely-fixed likely fix is in the repository, success not confirmed yet. NewUseCase needed to address a use case, we can't yet support. v3only Only affects v3 branches. v3 issue deferred to or affects version 3
Projects
None yet
Development

No branches or pull requests

2 participants