Redis implementation of nodupe plugin #700

gcglinton · 2023-06-12T13:59:31Z

As we've been talking about for weeks now, there's a need to implement a Redis version of the NoDupe cache, so that SR3 can be made more stateless, and cleanly containerized.

Having now written a full test suite for the disk-based NoDupe cache (#697), I have a better understanding of how it works.

One of main things I'm left wondering, is whether it's a problem if items in the cache expire between on_housekeeping runs. I understand why that's not the model for the disk nodupe, because it would be too complicated/expensive to have a per-item TTL, but in Redis that would be pretty straight forward, as individual keys can have expiries set.

However, if that's not desired, in order to keep the functioning of the two implementations the same, then a little bit of extra work will need to be done in Redis to most efficiently store the data, while making it easy to retrieve and process.

The text was updated successfully, but these errors were encountered:

petersilva · 2023-06-14T12:46:03Z

Whatever is less work is "better". ... Will it expire entries that no one looks up?

…

On Mon, Jun 12, 2023, 09:59 Greg ***@***.***> wrote: As we've been talking about for weeks now, there's a need to implement a Redis version of the NoDupe cache, so that SR3 can be made more stateless, and cleanly containerized. Having now written a full test suite for the disk-based NoDupe cache (#697 <#697>), I have a better understanding of how it works. One of main things I'm left wondering, is whether it's a problem if items in the cache expire *between* on_housekeeping runs. I understand why that's not the model for the disk nodupe, because it would be too complicated/expensive to have a per-item TTL, but in Redis that would be pretty straight forward, as individual keys can have expiries set. However, if that's not desired, in order to keep the functioning of the two implementations the same, then a little bit of extra work will need to be done in Redis to most efficiently store the data, while making it easy to retrieve and process. — Reply to this email directly, view it on GitHub <#700>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADAHBOX57EAN5HFBONEHMLXK4OFBANCNFSM6AAAAAAZDNUGXU> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

gcglinton · 2023-06-14T13:29:56Z

I'd say this way is less work.
The SR3 code is much simpler (404 lines in disk, 221 in Redis), since we don't have to do anything on_housekeeping, on_start, or on_stop.

Whether or not anyone looks up an entry, they expire when the TTL is up. But that doesn't really matter IMO, because a lookup causes it to get re-added anyway.

I could configure it to extend the TTL on lookup if needed, but that's not the way the disk one operates now, so it would be an even larger difference between the two implementations.

petersilva · 2023-06-17T19:31:19Z

the lack of resolution in expiry was just a concession to efficiency (not checking every time we look any entry up.)
it sounds like the simpler redis implementation will just be plain better.

gcglinton · 2023-06-19T12:23:27Z

I did some quick tests thanks to @reidsunderland's help with a basic config, and as far as I can tell, both the disk and Redis drivers work identically; detecting duplicate files, and rejecting them.

I haven't yet taken the time to write a proper comparative unit test (comparing disk, redis, and the expected value), but I think it should be pretty easy to do.

petersilva · 2023-06-26T02:21:53Z

part of 3.0.41

gcglinton mentioned this issue Jun 19, 2023

Redis driver for Nodupe #705

Merged

petersilva added enhancement New feature or request likely-fixed likely fix is in the repository, success not confirmed yet. NewUseCase needed to address a use case, we can't yet support. v3 issue deferred to or affects version 3 v3only Only affects v3 branches. labels Jun 21, 2023

petersilva assigned gcglinton Jun 21, 2023

petersilva closed this as completed Jun 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redis implementation of nodupe plugin #700

Redis implementation of nodupe plugin #700

gcglinton commented Jun 12, 2023

petersilva commented Jun 14, 2023 via email

gcglinton commented Jun 14, 2023

petersilva commented Jun 17, 2023

gcglinton commented Jun 19, 2023

petersilva commented Jun 26, 2023

Redis implementation of nodupe plugin #700

Redis implementation of nodupe plugin #700

Comments

gcglinton commented Jun 12, 2023

petersilva commented Jun 14, 2023 via email

gcglinton commented Jun 14, 2023

petersilva commented Jun 17, 2023

gcglinton commented Jun 19, 2023

petersilva commented Jun 26, 2023