Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relaxed RDB version check #1604

Merged
merged 4 commits into from
Jan 27, 2025

Conversation

zuiderkwast
Copy link
Contributor

New config rdb-version-check with values:

  • strict: Reject future RDB versions.
  • relaxed: Try parsing future RDB versions and fail only when an unknown RDB opcode or type is encountered.

This can make it possible for Valkey 8.1 to try read a dump from for example Valkey 9.0 or later on a best-effort basis. The conditions for when this is expected to work can be defined when the future Valkey versions are released. Loading is expected to fail in the following cases:

  • If the data set contains any new key types or other data elements not supported by the current version.
  • If the RDB contains new representations or encodings of existing key types or other data elements.

This change also prepares for the next RDB version bump. A range of RDB versions (12-79) is reserved, since it's expected to be used by foreign software RDB versions, so Valkey will not accept versions in this range even with the relaxed version check. The DUMP/RESTORE format has no magic string; only the RDB version number.

This change also prepares for the magic string to change from REDIS to VALKEY next time we bump the RDB version.

Related to #1108.

@zuiderkwast zuiderkwast force-pushed the rdb-version-check-relaxed branch from 8ca5d4c to 12067d4 Compare January 22, 2025 15:50
Copy link

codecov bot commented Jan 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.84%. Comparing base (7fc958d) to head (1ec49fa).
Report is 9 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1604      +/-   ##
============================================
+ Coverage     70.80%   70.84%   +0.04%     
============================================
  Files           121      121              
  Lines         65132    65137       +5     
============================================
+ Hits          46118    46149      +31     
+ Misses        19014    18988      -26     
Files with missing lines Coverage Δ
src/cluster.c 89.23% <100.00%> (+0.01%) ⬆️
src/config.c 78.39% <ø> (ø)
src/rdb.c 77.35% <100.00%> (+0.49%) ⬆️
src/server.h 100.00% <ø> (ø)

... and 13 files with indirect coverage changes

@zuiderkwast zuiderkwast requested a review from madolson January 22, 2025 16:26
Copy link
Collaborator

@hpatro hpatro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New config rdb-version-check with values:

  • strict: Reject future RDB versions.
  • relaxed: Try parsing future RDB versions and fail only when an unknown RDB opcode or type is encountered.

Should we introduce three modes? last one being load all the keys which are feasible to be done on the server and instead of failing on error, just skip the key.

For caching scenarios where the application can handle pulling missing key from the database, the third mode might be handy.

Copy link
Member

@soloestoy soloestoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of my concerns is that even if users don't utilize the new features, failures might still occur.

For example, when we upgraded RDB encoding from ziplist to listpack, from the user's perspective, they weren't using any new features and were simply performing regular operations like LPUSH, SADD, or HSET. However, the RDB encoding had changed, making a successful downgrade impossible.

In such cases, users might challenge us and demand a downgrade of the RDB version, which is something I'd very much like to avoid. If we could reach an agreement here—that is, we only allow a "best effort" approach to loading a higher version RDB and won't compromise on performing RDB version downgrades in the future—then this PR would be acceptable. It's important to establish this baseline, and the baseline will not be repeatedly tested or challenged in the future.

Moreover, I am currently working on full AOF synchronization development. I believe this approach is more promising than attempting to load a higher version RDB, so I think we can put this PR on hold.

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change makes sense.

Moreover, I am currently working on full AOF synchronization development. I believe this approach is more promising than attempting to load a higher version RDB, so I think we can put this PR on hold.

Your proposal doesn't solve the problem this change is addressing, since we can still end up with data that the lower version doesn't know how to store. I think we should make progress on this PR independent of your suggested change.

valkey.conf Outdated Show resolved Hide resolved
tests/unit/dump.tcl Outdated Show resolved Hide resolved
@soloestoy soloestoy added the major-decision-pending Major decision pending by TSC team label Jan 23, 2025
@soloestoy
Copy link
Member

Your proposal doesn't solve the problem this change is addressing, since we can still end up with data that the lower version doesn't know how to store.

AOF full-resync is also not meant to completely solve this issue, but I believe it will be better than RDB. At this point, RDB encoding changes are unaffected.

I think we should make progress on this PR independent of your suggested change.

Perhaps I didn't express clearly. AOF full-resync and this PR are not related, but I am very concerned about the future risks that this PR may bring. I hope that the things we do will stop at only "trying our best" to load higher version RDBs, and that we will not compromise by doing RDB version downgrades in the future. This is the premise that I care about.

@madolson
Copy link
Member

madolson commented Jan 23, 2025

Perhaps I didn't express clearly. AOF full-resync and this PR are not related, but I am very concerned about the future risks that this PR may bring. I hope that the things we do will stop at only "trying our best" to load higher version RDBs, and that we will not compromise by doing RDB version downgrades in the future. This is the premise that I care about.

Let's commit to it then! AWS is also interested in supporting forward loading of data from newer versions as long as it is compatible. I think the other managed providers would love this as well. I am fine introducing a much stronger guarantee as long as people are willing to implement the functionality. I have no concerns with saying we are "indefinitely forward" compatible as long as the data is compatible.

@soloestoy
Copy link
Member

I'm OK with that, @valkey-io/core-team let's vote on it: we support relaxing the RDB version check and will "try our best" to load recognizable data. For higher version data that cannot be recognized, we will not and will not in the future support on downgrading compatibility for RDB versions.

Co-authored-by: Madelyn Olson <[email protected]>
Signed-off-by: Viktor Söderqvist <[email protected]>
@zuiderkwast
Copy link
Contributor Author

@soloestoy Yes, best-effort and it can fail even if no new features are used. That's what the second bullet in the top comment explains: The loading is expected to fail "if the RDB contains new representations or encodings of existing key types or other data elements". This means that it can fail even if no new features are used.

It's important to establish this baseline, and the baseline will not be repeatedly tested or challenged in the future.

I agree with you. However, whatever we decide now, I think we will be very careful in the future about changing encodings (listpacks, etc.) because we will not want to break the RDB forward-compatibility. We will probably discuss if changing the encoding is worth it or not.

Moreover, I am currently working on full AOF synchronization development.

This is very good news! I really hope we can have it in 8.1.

We already have REPLCONF VERSION, so the primary already knows the replica's version. If the primary can see that the replica can't read the RDB version, then I wish it can send an AOF full sync as a fallback. If we do that, then don't need to worry about doing RDB version bumps in the future. Let's discuss it in #59.

@hwware
Copy link
Member

hwware commented Jan 24, 2025

@soloestoy We are not sure if you will join the next meeting on Monday because of the Chinese lunar new year, please vote here if possible. Thanks

@madolson madolson added the release-notes This issue should get a line item in the release notes label Jan 27, 2025
@zuiderkwast zuiderkwast added major-decision-approved Major decision approved by TSC team and removed major-decision-pending Major decision pending by TSC team labels Jan 27, 2025
@zuiderkwast zuiderkwast merged commit e9b8970 into valkey-io:unstable Jan 27, 2025
50 checks passed
@zuiderkwast zuiderkwast deleted the rdb-version-check-relaxed branch January 27, 2025 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-approved Major decision approved by TSC team release-notes This issue should get a line item in the release notes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants