Skip to content

Conversation

@diego-ciciani01
Copy link

Problem e test suite fails when IPv6 is not available on the system with the error:

The test suite fails when IPv6 is not available on the system

Solution

  • Add is_ipv6_available() to detect IPv6 support
  • Add VALLKEY_REQUIRE_IPV6 env var (inside a 'if' condition) to force IPv6 tests in CI, implement in the pipeline the export and volorization of env var
  • Skip IPv6 tests automatically when IPv6 is not available
  • Wrapped IPv6 tests in tests/unit/introspection.tcl to skip when IPv6 is unavailable

Testing

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job. However, I think we should do it using the existing mechanism to skip tests based on the existing tagging.

The tests/test_helper.tcl is the entry point that checks command line arguments. Here, we can add "ipv6" to ::denytags if IPv6 is not available and not required.

Comment on lines 508 to 526
if {[catch {exec ifconfig lo0} result] == 0} {
if {[string match "*inet6*::1*" $result]} {
return 1
}
}

if {[catch {exec netstat -an} result] == 0} {
if {[string match "*::1*" $result]} {
return 1
}
}

if {[catch {
set sock [socket -async ::1 22]
close $sock
return 1
}]} {
return 0
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need three ways? Can we use only the 3rd way to detect this and save the result in some global variable to avoid doing it multiple times?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, multiple checks can seem a bit repetitive, but during development, the third test on my macbook never worked (I'll leave a test I did in the comments, but I didn't add it to the PR). I also made sure that IPv6 was enabled, so as a workaround I also did the first two checks.

I currently have version 9 of tcl.

here the the did

proc is_ipv6_available {} { if {[catch { set sock [socket -async ::1 0] close $sock } err]} { puts "ipv6 check failed: $err" return 0 } return 1 }

and i leave the result of the running test

Screenshot 2025-10-07 alle 11 43 23

Copy link
Contributor

@zuiderkwast zuiderkwast Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're testing to connect to port 22 = the SSH port. It is not always opened. We can not even assume that SSH is installed.

I think we can't assume that any specific port is opened. What we need to do is to create a server socket on a free port chosen by the OS. To do this, use port 0, a command like socket -server ::1 0. Read details under SERVER SOCKETS on the page https://www.tcl-lang.org/man/tcl/TclCmd/socket.html#M9

This is what I get on my system where IPv6 is available:

$ tclsh
% set s [socket -server ::1 0]
sockaaab2529cdd0
% chan configure $s -sockname
0.0.0.0 0.0.0.0 33953 :: :: 33953

Do the tests work with TCL 9? Do you get some error when running all the tests?

From what I remember, TCL 9 didn't work and I have opened #1673 to fix that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try this:

Suggested change
if {[catch {exec ifconfig lo0} result] == 0} {
if {[string match "*inet6*::1*" $result]} {
return 1
}
}
if {[catch {exec netstat -an} result] == 0} {
if {[string match "*::1*" $result]} {
return 1
}
}
if {[catch {
set sock [socket -async ::1 22]
close $sock
return 1
}]} {
return 0
}
if {[catch {
set sock [socket -server ::1 0]
close $sock
return 1
}]} {
return 0
}

Copy link
Contributor

@zuiderkwast zuiderkwast Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how TCL behaves, but if set sock [socket -server ::1 0] never fails, even if IPv6 is not available, then we maybe also need to try to connect as a client to this server socket.

Something like this:

    if {[catch {
        set server [socket -server ::1 0]
        set port [lindex [chan configure $server -sockname] 5]
        set client [socket ::1 $port]
        close $server
        close $client
        return 1
    }]} {
        return 0
    }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I solved the problem. As you said, version 9 of TCL doesn't work very well. I switched to 8.6, and now the socket opens correctly.

I will test also the client - server connect

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you set set port [lindex [chan configure $server -sockname] 5] ?

i thinks that the correct value should be 2, because this position returns the port

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find the docs for this but I tested locally in tclsh and I got the port on two places:

% chan configure $sock -sockname           
0.0.0.0 0.0.0.0 35783 :: :: 35783

It looks like IPv4 IPv4 port IPv6 IPv6 port. The last port maybe is for IPv6. That's what I was thinking. I don't know if it's correct. It's better to find the proper documentation for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah it make sense, so i found this doc ->https://core.tcl-lang.org/tips/doc/trunk/tip/162.md
which explains exactly your test.

So i also added few commits with the changes that you told me

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah it make sense, so i found this doc ->https://core.tcl-lang.org/tips/doc/trunk/tip/162.md
which explains exactly your test.

So i also added few commits with the changes that you told me

@diego-ciciani01
Copy link
Author

hi @zuiderkwast, thank you very much for your feedback. If you need any further clarification, please let me know. Your advice is clear to me, please also let me know what you think about the IPv6 socket issue, after which I will modify the PR with your suggestions.

@codecov
Copy link

codecov bot commented Oct 7, 2025

Codecov Report

❌ Patch coverage is 93.26923% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.62%. Comparing base (8182f4a) to head (fe06ea2).
⚠️ Report is 17 commits behind head on unstable.

Files with missing lines Patch % Lines
src/replication.c 25.00% 3 Missing ⚠️
src/scripting_engine.c 62.50% 3 Missing ⚠️
src/functions.c 96.87% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2674      +/-   ##
============================================
+ Coverage     72.58%   72.62%   +0.04%     
============================================
  Files           128      128              
  Lines         71277    71303      +26     
============================================
+ Hits          51736    51786      +50     
+ Misses        19541    19517      -24     
Files with missing lines Coverage Δ
src/blocked.c 91.32% <100.00%> (ø)
src/cluster_migrateslots.c 91.69% <ø> (-0.40%) ⬇️
src/commands.def 100.00% <ø> (ø)
src/db.c 93.24% <100.00%> (ø)
src/eval.c 89.23% <100.00%> (ø)
src/hashtable.c 89.32% <100.00%> (+0.01%) ⬆️
src/kvstore.c 96.01% <100.00%> (ø)
src/lazyfree.c 92.56% <100.00%> (+6.17%) ⬆️
src/lua/engine_lua.c 88.12% <100.00%> (-0.22%) ⬇️
src/lua/function_lua.c 98.26% <100.00%> (-0.45%) ⬇️
... and 6 more

... and 17 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Add is_ipv6_available() to detect IPv6 support

 - Add VALLKEY_REQUIRE_IPV6 env var (inside a 'if' condition) to force  IPv6 tests in CI, implement in the pipeline the export and volorization of env var

- Skip IPv6 tests automatically when IPv6 is not available

- BUG valkey-io#2643

Signed-off-by: diego-ciciani01 <[email protected]>
Moves utility functions previously defined in  to the central test helper file, , for better organization and consistency across the testing framework.

Additionally, this commit introduces a new tag-based logic to conditionally run tests based on the availability of IPv6 support on the system. Tests requiring IPv6 connectivity are now tagged accordingly, ensuring that they are only executed when the necessary network configuration is present.

Signed-off-by: diego-ciciani01 <[email protected]>
Moves utility functions previously defined in  to the central test helper file, , for better organization and consistency across the testing framework.

Additionally, this commit introduces a new tag-based logic to conditionally run tests based on the availability of IPv6 support on the system. Tests requiring IPv6 connectivity are now tagged accordingly, ensuring that they are only executed when the necessary network configuration is present.

Signed-off-by: diego-ciciani01 <[email protected]>
@diego-ciciani01 diego-ciciani01 force-pushed the fix-ipv6-test-detection branch from 5de3e86 to 63e36cb Compare October 9, 2025 15:32
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! This is getting closer. I have some more ideas though after reading code and documentation. (None of us is a Tcl expert.)

$c close
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

r client list not-ip $not_ip not-ip $not_ip
} {}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment on lines 993 to 994
#IPv6 detection utilities
#for this detaction: the socket connection on ::1 address
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a space between the # and the text, for better readability.

Suggested change
#IPv6 detection utilities
#for this detaction: the socket connection on ::1 address
# IPv6 detection utilities
# for this detaction: the socket connection on ::1 address

Comment on lines 996 to 998
if {[catch {
set server [socket -server ::1 0]
set port [lindex [chan configure $server -sockname] 5]
Copy link
Contributor

@zuiderkwast zuiderkwast Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reading the documentation (https://www.tcl-lang.org/man/tcl/TclCmd/socket.html) for server sockets again. The server sockets, the syntax is

socket -server command ?options? port

The IP address is not specified here. The command is the code to run when a client connects to it. The documentation says:

For each connection Tcl will create a new channel that may be used to communicate with the client. Tcl then invokes command (properly a command prefix list, see the EXAMPLES below) with three additional arguments: the name of the new channel, the address, in network address notation, of the client's host, and the client's port number.

Anyway, it seems to be OK to not have a correct command here, but I guess we should not write ::1 here because it's not an IP address at all. Better just use dummy or as the command (which will never be called anyway if the server doesn't start an event loop using vwait).

Also, if we use the option -myaddr ::1, then the server will only listen on the IPv6 address and the chan configure will only return 3 elements, where the last one is the port. That's also documented here under CONFIGURATION OPTIONS, -sockname:

For server sockets this option returns a list of a multiple of three elements each group of which have the same meaning as described above. The list contains more than one group when the server socket was created without -myaddr or with the argument to -myaddr being a domain name that resolves multiple IP addresses that are local to the invoking host.

One more thing, the indentation here looks odd. Use 4 spaces for each level.

To sum up:

Suggested change
if {[catch {
set server [socket -server ::1 0]
set port [lindex [chan configure $server -sockname] 5]
if {[catch {
set server [socket -server dummy -myaddr ::1 0]
set port [lindex [chan configure $server -sockname] 2]

Comment on lines 1008 to 1012
#Check if is required the flag to decide if run the tests or not
if {!$::require_ipv6 && ![is_ipv6_available]} {
lappend ::denytags "ipv6"
puts "IPv6 not available on this system, skipping IPv6 tests"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this will work, but comparing with other similar mechanisms to skip tests in various conditions, this logic should better be placed in the function tags_acceptable in the file tests/support/server.tcl. Add it next to these checks, here: https://github.com/valkey-io/valkey/blob/9.0.0-rc3/tests/support/server.tcl#L266

…check

- Fixed minor typos across test files.

- Updated  to include the new option:

- Moved IPv6 availability check to tests/support/server.tcl.

Signed-off-by: diego-ciciani01 <[email protected]>
Comment on lines 271 to 273
if {!$::require_ipv6 && ![is_ipv6_available]} {
lappend ::denytags "ipv6"
set err "IPv6 not available on this system, skipping IPv6 tests"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears you didn't get the purpose of this function and how it works.

This function runs for each test (and for start_server where tags can also be specified) and it checks it each test needs to be skipped or not.

For the command-line argument --tags, the ::denytags and ::allowtags are used. When a test is skipped because of ::denytags, it prints "Tag: $tag denied". When a test is skipped for another reason, that specific reason is printed instead, such as "Not supported in cluster mode", etc. The reason I wanted you to move the check to this function is to have a separate check and separate message, and not hijack the command line --tags mechanism.

As you can see for all of the checks in this file, it sets err to an appropriate explanation and then returns 0.

    if {$::debug_defrag && [lsearch $tags "debug_defrag:skip"] >= 0} {
        set err "Not supported on server compiled with DEBUG_FORCE_DEFRAG option"
        return 0
    }

    if {$::singledb && [lsearch $tags "singledb:skip"] >= 0} {
        set err "Not supported on singledb"
        return 0
    }

    if {$::cluster_mode && [lsearch $tags "cluster:skip"] >= 0} {
        set err "Not supported in cluster mode"
        return 0
    }

    if {$::tls && [lsearch $tags "tls:skip"] >= 0} {
        set err "Not supported in tls mode"
        return 0
    }

Each check checks if $tags contains a specific tag and also checks if some condition is met, then sets err and returns 0 (which means skip the this test).

What you did here is to set err to "IPv6 not available on this system, skipping IPv6 tests" but the function still returns 1 below so the first test is not skipped, and as a side-effect, ipv6 is added to ::denytags as if --tags -ipv6 was given on the command line. This means that the next test that has the ipv6 will be skipped as if because ipv6 is in ::denytags and the error message will be "Tag: ipv6 denied". This is what is printed when --tags -ipv6 is used on the command line. That's why I said "hijack" this machanism.

What the check needs to do instead is for check if "ipv6" is among the tags for this test and if it's not required and not available, then set the message and return 0. Something like...

Suggested change
if {!$::require_ipv6 && ![is_ipv6_available]} {
lappend ::denytags "ipv6"
set err "IPv6 not available on this system, skipping IPv6 tests"
if {!$::require_ipv6 && [lsearch $tags "ipv6"] >= 0 && ![is_ipv6_available]} {
set err "IPv6 not available on this system"
return 0

We could also add an optimization to avoid running the auto-detect for every test case with the ipv6 tag (but if that's not too many tests, then it might be OK to run it every time).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now i get what you mean. For optimization of the test, we can save the result of IPv6 check on a variable, to avoid performing the test unnecessarily. I agree to take this into consideration when there are a considerable number of IPv6 tests. At the moment, I only count 5.

"--io-threads Run tests with IO threads."
"--tls Run tests in TLS mode."
"--tls-module Run tests in TLS mode with Valkey module."
"--require-ipv6 Run tests forcing the ipv6."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm think if we need this flag or not. Comparing to other checks, I can see that we just skip tests that are not possible to run. For example we simply skip all ipv6 tests if we're running TCL 8.5. Similarly, some tests are skipped if TLS is disabled, if IO threads are used, and so on.

None of the other skipped tests have a require flag, so comparing to these other checks, I think we don't need this --require-ipv6 flag. We can just skip them if it's not available.

Sorry for changing my mind! 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry, hahah.
If I understand correctly, you just want to remove the print "--require-ipv6 Run tests forcing the ipv6.”
from, "print_help_screen" ? The meccanismo with the require_ipv6 variable remain the same ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, you just want to remove the print "--require-ipv6 Run tests forcing the ipv6.” from, "print_help_screen" ? The meccanismo with the require_ipv6 variable remain the same ?

No, I think we can remove the require_ipv6 variable completely.

There is no variables like that for other things that are only sometimes supported.


start_server {tags {"ipv6"} overrides {bind {127.0.0.1 ::1}}} {
test {CLIENT LIST with IPv6 filter} {
test {CLIENT KILL with IPv6 filter} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!


regexp {addr=\[([a-fA-F0-9:]+)\]:\d+} $client_info -> ipv6only
set filtered [$c client list not-ip "1.2.3.4" not-ip $ipv6only]
set filtered [$c client list not-ip $ipv6only]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks unnecessary. Did you review your own diff after fixing the merge conflict?

zuiderkwast and others added 16 commits October 21, 2025 11:50
When using `skip` inside a test to skip a test, when running with
--dump-logs it causes the logs to be dumped. Introduced in valkey-io#2342.

The reason is the "skipped" exception is caught in the start_server proc
in tests/support/server.tcl. This is where the $::dump_logs variable is
checked.

Signed-off-by: Viktor Söderqvist <[email protected]>
…alkey-io#747)

Currently, zdiff() becomes a no-op in the case that the first key is
empty.

The existing comment of "Skip everything if the smallest input is empty"
is misleading, as qsort is bypassed for the case of ZDIFF. There's no
guarantee that the first key holds the smallest cardinality.

The real reason behind such bypass is the following. ZDIFF computes the
difference between the first and all successive input sorted sets.
Meaning, if the first key is empty, we cannot reduce further from an
already empty collection, and thus zdiff() becomes a no-op.

Signed-off-by: Kyle Kim <[email protected]>
DEBUG LOADAOF sometimes works but it results in -LOADING responses to
the primary so there are lots of race conditions. It isn't something we
should be doing anyways. To test, I just disconnect the replica before
loading the AOF, then reconnect it.

Fixes valkey-io#2712

Signed-off-by: Jacob Murphy <[email protected]>
This test was failing, and causing the next test to throw an exception.
It is failing since we never waited for the slot migration to connect
before proceeding.

Fixes valkey-io#2692

Signed-off-by: Jacob Murphy <[email protected]>
…child snapshot is active (valkey-io#2721)

The race condition causes the client to be used and subsequently double
freed by the slot migration read pipe handler. The order of events is:

1. We kill the slot migration child process during CANCELSLOTMIGRATIONS
1. We then free the associated client to the target node
1. Although we kill the child process, it is not guaranteed that the
pipe will be empty from child to parent
1. If the pipe is not empty, we later will read that out in the
slotMigrationPipeReadHandler
1. In the pipe read handler, we attempt to write to the connection. If
writing to the connection fails, we will attempt to free the client
1. However, the client was already freed, so this a double free

Notably, the slot migration being aborted doesn't need to be triggered
by `CANCELSLOTMIGRATIONS`, it can be any failure.

To solve this, we simply:

1. Set the slot migration pipe connection to NULL whenever it is
unlinked
2. Bail out early in slot migration pipe read handler if the connection
is NULL

I also consolidate the killSlotMigrationChild call to one code path,
which is executed on client unlink. Before, there were two code paths
that would do this twice (once on slot migration job finish, and once on
client unlink). Sending the signal twice is fine, but inefficient.

Also, add a test to cancel during the slot migration snapshot to make
sure this case is covered (we only caught it during the module test).

---------

Signed-off-by: Jacob Murphy <[email protected]>
valkey-io#2329 intoduced a bug that causes a blocked client in cluster mode to
receive two MOVED redirects instead of one. This was not seen in tests,
except in the reply schema validator.

The fix makes sure the client's pending command is cleared after sending
the MOVED redirect, to prevent if from being reprocessed.

Fixes valkey-io#2676.

---------

Signed-off-by: Viktor Söderqvist <[email protected]>
Resolves valkey-io#2696 

The primary issue was that with sanitizer mode, the test needed more
time for primary’s replication buffers grow beyond `2 × backlog_size`.
Increasing the threshold of `repl-timeout` to 30s, ensures that the
inactive replica is not disconnected while the full sync is proceeding.
`rdb-key-save-delay` controls or throttles the data written to the
client output buffer, and in this case, we are deterministically able to
perform the fullsync within 10s (10000 keys * 0.001s).

Increasing the `wait_for_condition` gives it enough retries to verify
that `mem_total_replication_buffers` reaches the required `2 ×
backlog_size`.

Signed-off-by: Sarthak Aggarwal <[email protected]>
…SLOTS ESTABLISH (valkey-io#2498)

We now have valkey-io#2688 SYNCSLOTS CAPA, remove the outupdated comment.

Signed-off-by: Binbin <[email protected]>
increased the wait time to a total of 10 seconds where we check the log
for `Done loading RDB` message

Fixes valkey-io#2694

CI run (100 times):
https://github.com/roshkhatri/valkey/actions/runs/18576201712/job/52961907806

Signed-off-by: Roshan Khatri <[email protected]>
…load crash (valkey-io#1826)

There will be two issues in this test:
```
test {FUNCTION - test function flush} {
    for {set i 0} {$i < 10000} {incr i} {
        r function load [get_function_code LUA test_$i test_$i {return 'hello'}]
    }
    set before_flush_memory [s used_memory_vm_functions]
    r function flush sync
    set after_flush_memory [s used_memory_vm_functions]
    puts "flush sync, before_flush_memory: $before_flush_memory, after_flush_memory: $after_flush_memory"

    for {set i 0} {$i < 10000} {incr i} {
        r function load [get_function_code LUA test_$i test_$i {return 'hello'}]
    }
    set before_flush_memory [s used_memory_vm_functions]
    r function flush async
    set after_flush_memory [s used_memory_vm_functions]
    puts "flush async, before_flush_memory: $before_flush_memory, after_flush_memory: $after_flush_memory"

    for {set i 0} {$i < 10000} {incr i} {
        r function load [get_function_code LUA test_$i test_$i {return 'hello'}]
    }
    puts "Test done"
}
```

The first one is the test output, we can see that after executing
FUNCTION FLUSH,
used_memory_vm_functions has not changed at all:
```
flush sync, before_flush_memory: 2962432, after_flush_memory: 2962432
flush async, before_flush_memory: 4504576, after_flush_memory: 4504576
```

The second one is there is a crash when loading the functions during the
async
flush:
```
=== VALKEY BUG REPORT START: Cut & paste starting from here ===
 # valkey 255.255.255 crashed by signal: 11, si_code: 2
 # Accessing address: 0xe0429b7100000a3c
 # Crashed running the instruction at: 0x102e0b09c

------ STACK TRACE ------
EIP:
0   valkey-server                       0x0000000102e0b09c luaH_getstr + 52

Backtrace:
0   libsystem_platform.dylib            0x000000018b066584 _sigtramp + 56
1   valkey-server                       0x0000000102e01054 luaD_precall + 96
2   valkey-server                       0x0000000102e01b10 luaD_call + 104
3   valkey-server                       0x0000000102e00d1c luaD_rawrunprotected + 76
4   valkey-server                       0x0000000102e01e3c luaD_pcall + 60
5   valkey-server                       0x0000000102dfc630 lua_pcall + 300
6   valkey-server                       0x0000000102f77770 luaEngineCompileCode + 708
7   valkey-server                       0x0000000102f71f50 scriptingEngineCallCompileCode + 104
8   valkey-server                       0x0000000102f700b0 functionsCreateWithLibraryCtx + 2088
9   valkey-server                       0x0000000102f70898 functionLoadCommand + 312
10  valkey-server                       0x0000000102e3978c call + 416
11  valkey-server                       0x0000000102e3b5b8 processCommand + 3340
12  valkey-server                       0x0000000102e563cc processInputBuffer + 520
13  valkey-server                       0x0000000102e55808 readQueryFromClient + 92
14  valkey-server                       0x0000000102f696e0 connSocketEventHandler + 180
15  valkey-server                       0x0000000102e20480 aeProcessEvents + 372
16  valkey-server                       0x0000000102e4aad0 main + 26412
17  dyld                                0x000000018acab154 start + 2476

------ STACK TRACE DONE ------
```

The reason is that, in the old implementation (introduced in 7.0),
FUNCTION FLUSH
use lua_unref to remove the script from lua VM. lua_unref does not
trigger the gc,
it causes us to not be able to effectively reclaim memory after the
FUNCTION FLUSH.

The other issue is that, since we don't re-create the lua VM in FUNCTION
FLUSH,
loading the functions during a FUNCTION FLUSH ASYNC will result a crash
because
lua engine state is not thread-safe.

The correct solution is to re-create a new Lua VM to use, just like
SCRIPT FLUSH.

---------

Signed-off-by: Binbin <[email protected]>
Signed-off-by: Ricardo Dias <[email protected]>
Co-authored-by: Ricardo Dias <[email protected]>
…on (valkey-io#2749)

When working on valkey-io#2635 I errorneously duplicated the
setSlotImportingStateInAllDbs call for successful imports. This resulted
in us doubling the key count in the kvstore. This results in DBSIZE
reporting an incorrect sum, and also causes BIT corruption that can
eventually result in a crash.

The solution is:

1. Only call setSlotImportingStateInAllDbs once (in our
finishSlotMigrationJob function)
2. Make setSlotImportingStateInAllDbs idempotent by checking if the
delete from the kvstore importing hashtable is a no-op

This also fixes a bug where the number of importing keys is not lowered
after the migration, but this is less critical since it is only used
when resizing the dictionary on RDB load. However, it could result in
un-loadable RDBs if the importing key count gets large enough.

Signed-off-by: Jacob Murphy <[email protected]>
This was introduced in valkey-io#1826. This create an `Uninitialised value was
created by a heap allocation` in the CI.

Signed-off-by: Binbin <[email protected]>
…iteration (valkey-io#2753)

Safe iterators pause rehashing, but don't pause auto shrinking. This
allows stale bucket references which then cause use after free (in this
case, via compactBucketChain on a deleted bucket).

This problem is easily reproducible via atomic slot migration, where we
call delKeysInSlot which relies on calling delete within a safe
iterator. After the fix, it no longer causes a crash.

Since all cases where rehashing is paused expect auto shrinking to also
be paused, I am making this happen automatically as part of pausing
reshashing.

---------

Signed-off-by: Jacob Murphy <[email protected]>
enjoy-binbin and others added 2 commits October 21, 2025 11:50
This clusterLink->flags was added in valkey-io#2310, during the review, we at
the end chose to add a new CLUSTER_LINK_XXX flag instead of sharing the
old CLUSTER_NODE_XXX flag.

Signed-off-by: Binbin <[email protected]>
- Is been removed the IPv6 flag

The return statement inside a catch block doesn't propagate out of
the procedure. Check catch return value (0 = success) instead.
Reference: https://www.tcl.tk/man/tcl8.6/TclCmd/catch.html

Signed-off-by: diego-ciciani01 <[email protected]>
@diego-ciciani01
Copy link
Author

hi @zuiderkwast

While testing, I noticed an important detail about the catch usage in is_ipv6_available().

Initially, we had:

proc is_ipv6_available {} {
    if {[catch {
        set server [socket -server dummy -myaddr ::1 0]
        set port [lindex [chan configure $server -sockname] 2]
        set client [socket ::1 $port]
        close $server
        close $client
        return 1
    }]} {
        return 0
    }
}

However, the return 1 inside the catch block doesn't actually return from the function - it only exits the catch block. This means if the socket operations succeed, the function would fall through without returning a value.

The correct pattern, i think is :

proc is_ipv6_available {} {
    if {[catch {
        set server [socket -server dummy -myaddr ::1 0]
        set port [lindex [chan configure $server -sockname] 2]
        set client [socket ::1 $port]
        close $server
        close $client
    }] == 0} {
        return 1
    }
    return 0
}

According to the Tcl catch documentation, catch returns 0 on success and non-zero on error, but return statements inside the catch block don't propagate out of the procedure.

So let me know what do you think

@zuiderkwast
Copy link
Contributor

However, the return 1 inside the catch block doesn't actually return from the function - it only exits the catch block.

Oh, I didn't know that! TCL is weird. Thank you! 😄

@zuiderkwast
Copy link
Contributor

This PR has now a lot of unrelated commits and changed 33 files. What did you do?

The recommended way is to merge unstable into the branch and do a normal push, not a rebase and force-push. It doesn't matter if there are merge-commits, because when we merge, we will squash all the commits into one anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] make test should detect ipv6 support

8 participants