You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Maybe we are off in Command spawn trying to create a dataset and... well.. I'm not sure what other than the attempt to create a dataset is going to fail.
The agent log from the time knows something bad has happened, but no panic messages are logged:
we have many failures like this:
2024-12-10 01:38:06.019Z ERRO crucible-agent/24992 (worker) on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: Dataset 2cac30ef-1d1e-40ff-b7cc-979c01286442 creation failed: zfs create failed! out: err:cannot create 'oxp_ae56280b-17ce-4266-8573-e1da9db6c6bb/crucible/regions/2cac30ef-1d1e-40ff-b7cc-979c01286442': out of space
2024-12-10 01:38:06.019Z INFO crucible-agent/24992 (datafile) on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: region 2cac30ef-1d1e-40ff-b7cc-979c01286442 state: Requested -> Failed
2024-12-10 01:38:06.755Z INFO crucible-agent/24992 (dropshot) on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: request completed
Then, we see that the service has restarted, this is right around the time on the core file. But, no panic messages in the log.
[ Dec 10 02:15:46 Stopping because all processes in service exited. ]
[ Dec 10 02:15:46 Executing start method ("/opt/oxide/crucible/bin/crucible-agent run -D /opt/oxide/crucible/bin/crucible-downstairs --dataset oxp_ae56280b-17ce-4266-8573-e1da9db6c6bb/crucible -l [fd00:1122:3344:109::4]:32345 -P 19000 -p downstairs -s snapshot"). ]
note: configured to log to "/dev/stdout"
2024-12-10 02:15:46.746Z INFO crucible-agent/6570 on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: dataset: "oxp_ae56280b-17ce-4266-8573-e1da9db6c6bb/crucible"
2024-12-10 02:15:46.755Z INFO crucible-agent/6570 on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: listen IP: [fd00:1122:3344:109::4]:32345
2024-12-10 02:15:46.755Z INFO crucible-agent/6570 on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: SMF instance name downstairs_prefix: "downstairs"
2024-12-10 02:15:46.782Z INFO crucible-agent/6570 (datafile) on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: Using conf_path:"/data/crucible.json"
2024-12-10 02:15:46.872Z INFO crucible-agent/6570 (dropshot) on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: listening
local_addr = [fd00:1122:3344:109::4]:32345
[ Dec 10 02:17:33 Stopping because all processes in service exited. ]
[ Dec 10 02:17:33 Executing start method ("/opt/oxide/crucible/bin/crucible-agent run -D /opt/oxide/crucible/bin/crucible-downstairs --dataset oxp_ae56280b-17ce-4266-8573-e1da9db6c6bb/crucible -l [fd00:1122:3344:109::4]:32345 -P 19000 -p downstairs -s snapshot"). ]
note: configured to log to "/dev/stdout"
2024-12-10 02:17:33.706Z INFO crucible-agent/8871 on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: dataset: "oxp_ae56280b-17ce-4266-8573-e1da9db6c6bb/crucible"
2024-12-10 02:17:33.713Z INFO crucible-agent/8871 on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: listen IP: [fd00:1122:3344:109::4]:32345
2024-12-10 02:17:33.713Z INFO crucible-agent/8871 on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: SMF instance name downstairs_prefix: "downstairs"
2024-12-10 02:17:33.741Z INFO crucible-agent/8871 (datafile) on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: Using conf_path:"/data/crucible.json"
2024-12-10 02:17:33.828Z INFO crucible-agent/8871 (dropshot) on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: listening
local_addr = [fd00:1122:3344:109::4]:32345
[ Dec 10 02:17:56 Stopping because all processes in service exited. ]
[ Dec 10 02:17:56 Executing start method ("/opt/oxide/crucible/bin/crucible-agent run -D /opt/oxide/crucible/bin/crucible-downstairs --dataset oxp_ae56280b-17ce-4266-8573-e1da9db6c6bb/crucible -l [fd00:1122:3344:109::4]:32345 -P 19000 -p downstairs -s snapshot"). ]
note: configured to log to "/dev/stdout"
2024-12-10 02:17:56.292Z INFO crucible-agent/9148 on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: dataset: "oxp_ae56280b-17ce-4266-8573-e1da9db6c6bb/crucible"
2024-12-10 02:17:56.300Z INFO crucible-agent/9148 on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: listen IP: [fd00:1122:3344:109::4]:32345
2024-12-10 02:17:56.300Z INFO crucible-agent/9148 on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: SMF instance name downstairs_prefix: "downstairs"
2024-12-10 02:17:56.324Z INFO crucible-agent/9148 (datafile) on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: Using conf_path:"/data/crucible.json"
2024-12-10 02:17:56.406Z INFO crucible-agent/9148 (dropshot) on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: listening
local_addr = [fd00:1122:3344:109::4]:32345
[ Dec 10 02:17:58 Stopping because all processes in service exited. ]
[ Dec 10 02:17:59 Executing start method ("/opt/oxide/crucible/bin/crucible-agent run -D /opt/oxide/crucible/bin/crucible-downstairs --dataset oxp_ae56280b-17ce-4266-8573-e1da9db6c6bb/crucible -l [fd00:1122:3344:109::4]:32345 -P 19000 -p downstairs -s snapshot"). ]
note: configured to log to "/dev/stdout"
2024-12-10 02:17:59.020Z INFO crucible-agent/9157 on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: dataset: "oxp_ae56280b-17ce-4266-8573-e1da9db6c6bb/crucible"
2024-12-10 02:17:59.027Z INFO crucible-agent/9157 on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: listen IP: [fd00:1122:3344:109::4]:32345
2024-12-10 02:17:59.027Z INFO crucible-agent/9157 on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: SMF instance name downstairs_prefix: "downstairs"
2024-12-10 02:17:59.051Z INFO crucible-agent/9157 (datafile) on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: Using conf_path:"/data/crucible.json"
2024-12-10 02:17:59.138Z INFO crucible-agent/9157 (dropshot) on oxz_crucible_90b53c3d-42fa-4ca9-bbfc-96fff245b508: listening
local_addr = [fd00:1122:3344:109::4]:32345
note: configured to log to "/dev/stdout"
The text was updated successfully, but these errors were encountered:
On rack2, we managed to run a few datasets out of space.
As a side effect, crucible-agent could not make progress, and even worse started dumping core over and over:
Here is a subest of the cores from a single minute:
The backtrace is not very helpful:
stacks gives me a potential clue:
Maybe we are off in Command spawn trying to create a dataset and... well.. I'm not sure what other than the attempt to create a dataset is going to fail.
The agent log from the time knows something bad has happened, but no panic messages are logged:
we have many failures like this:
Then, we see that the service has restarted, this is right around the time on the core file. But, no panic messages in the log.
The text was updated successfully, but these errors were encountered: