(lib/runtime): fail to execute block `17088245` allocator out of space #3774

EclesioMeloJunior · 2024-02-26T12:42:24Z

2024-02-24T03:55:37Z INFO     imported block 0x2aad31837b6cdf8ce5964cca86b3a39b67ff872aa5eb6c954217872f6683d63d and stored state trie with root 0xe40e9900f63d1539014ce5cc4fa0e4932ae4ed1154c31cd77714d0fdf2d4e8b3	pkg=core
RUNTIME HASH: 0xd1d708433e5d7bcb9fc30a3de67952c1611fa58cd1980146ab07aa438116d352
2024-02-24T03:55:37Z ERROR    block data processing for block with hash 0x0feb9c8bb9018bbcf20b467e15636de02397de00c66233c5b4a06f5be3a2f558 failed: processing block data with header and body: handling block: failed to execute block 17088245: running runtime function: bumping: allocator out of space: current pages 65535 greater than max wasm pages 65535 (recovered by wazero)
wasm stack trace:
	env.ext_allocator_malloc_version_1(i32) i32
	._ZN5sp_io9allocator26extern_host_function_impls6malloc17h175ebffadfcb2d44E(i32) i32
	._ZN5alloc7raw_vec11finish_grow17h97bec91bef32a5ebE.llvm.1609301459894010017(i32,i32,i32,i32)
	._ZN5alloc7raw_vec19RawVec$LT$T$C$A$GT$16reserve_for_push17h1ed9c357b734a946E(i32,i32)
	._ZN18parity_scale_codec5codec19decode_vec_with_len17h15e2f12c18cd243fE(i32,i32,i32)
	._ZN13frame_support7storage8unhashed3get17hb6cef7316d8fee28E(i32,i32,i32)
	._ZN353_$LT$pallet_staking..ExposureOf$LT$T$GT$$u20$as$u20$sp_runtime..traits..Convert$LT$$LT$T$u20$as$u20$frame_system..pallet..Config$GT$..AccountId$C$core..option..Option$LT$pallet_staking..Exposure$LT$$LT$T$u20$as$u20$frame_system..pallet..Config$GT$..AccountId$C$$LT$T$u20$as$u20$pallet_staking..pallet..pallet..Config$GT$..CurrencyBalance$GT$$GT$$GT$$GT$7convert17h469fc883609fa8e1E(i32,i32)
	._ZN5alloc3vec16in_place_collect108_$LT$impl$u20$alloc..vec..spec_from_iter..SpecFromIter$LT$T$C$I$GT$$u20$for$u20$alloc..vec..Vec$LT$T$GT$$GT$9from_iter17h34924d7a99a1d609E(i32,i32)
	._ZN14pallet_session10historical7onchain39store_session_validator_set_to_offchain17h0e99a24533159ab9E(i32)
	._ZN893_$LT$$LP$TupleElement0$C$TupleElement1$C$TupleElement2$C$TupleElement3$C$TupleElement4$C$TupleElement5$C$TupleElement6$C$TupleElement7$C$TupleElement8$C$TupleElement9$C$TupleElement10$C$TupleElement11$C$TupleElement12$C$TupleElement13$C$TupleElement14$C$TupleElement15$C$TupleElement16$C$TupleElement17$C$TupleElement18$C$TupleElement19$C$TupleElement20$C$TupleElement21$C$TupleElement22$C$TupleElement23$C$TupleElement24$C$TupleElement25$C$TupleElement26$C$TupleElement27$C$TupleElement28$C$TupleElement29$C$TupleElement30$C$TupleElement31$C$TupleElement32$C$TupleElement33$C$TupleElement34$C$TupleElement35$C$TupleElement36$C$TupleElement37$C$TupleElement38$C$TupleElement39$C$TupleElement40$C$TupleElement41$C$TupleElement42$C$TupleElement43$C$TupleElement44$C$TupleElement45$C$TupleElement46$C$TupleElement47$RP$$u20$as$u20$frame_support..traits..hooks..OnInitialize$LT$BlockNumber$GT$$GT$13on_initialize17h252b62108cdb4ff8E(i32,i32)
	._ZN15frame_executive104Executive$LT$System$C$Block$C$Context$C$UnsignedValidator$C$AllPalletsWithSystem$C$COnRuntimeUpgrade$GT$16initialize_block17h25c01251da9ce46aE(i32)
	._ZN15frame_executive104Executive$LT$System$C$Block$C$Context$C$UnsignedValidator$C$AllPalletsWithSystem$C$COnRuntimeUpgrade$GT$13execute_block17h04bd9202b85e1554E(i32)
	.Core_execute_block(i32,i32) i64	pkg=sync

edit
After a long debug session with @jimjbrettj™ we found that the allocation problem was related with long running instances and that is not trivial to clean up the runtime memory after a block execution (manually clean up), so here is a suggestion of what can be done to solve and a reference to substrate (https://github.com/paritytech/polkadot-sdk/blob/0c6c837f689a287583508506e342ba07687e8d26/substrate/client/executor/wasmtime/src/runtime.rs#L170C39-L170C55)

Instantiate a runtime for every block execution

The text was updated successfully, but these errors were encountered:

EclesioMeloJunior · 2024-03-11T16:49:04Z

Related to the allocator problem I will need to wait for EC2 node to run towards block 17M to validate the changes to the allocator, I cannot reproduce it locally since when I start the node the memory is completely empty when executing the block, differently from the ec2 node where the allocator/memory is being used every imported block. Given that the allocator exports its metrics through Prometheus we can compare the usage now and before (where Gossamer had less one page)

EclesioMeloJunior · 2024-03-18T15:37:04Z

After investigating substrate FreeingBumpHeapAllocator I notice that Gossamer is not cleaning up the allocator after block execution, instead Gossamer is refreshing the allocator only on every runtime upgrade

Here is the call_impl method which is called when the host needs to call a runtime func

Also the PR #3800 introduces the fix

EclesioMeloJunior · 2024-04-03T21:34:07Z

After investigations with @jimjbrettj™ we found that the allocator still facing the same problem, just reinstantiating the allocator is not enough to solve the allocator out of space problem and we are facing wasm out of bounds access problems as shown in the image belown.

EclesioMeloJunior · 2024-04-05T13:51:47Z

Same problem happening on westend staging at block #9412261

EclesioMeloJunior · 2024-05-13T15:11:13Z

While syncing to #17m we faced a super slowdown after a runtime upgrade (0.9.25) which I am investigating at #3950

EclesioMeloJunior · 2024-06-26T12:50:24Z

The fix was confirmed in our staging nodes given that the latest block finalized was #17272320

fyi @jimjbrettj @P1sar @timwu20

EclesioMeloJunior added P-high this should be addressed ASAP. S-sync-westend related to particular network syncing. labels Feb 26, 2024

EclesioMeloJunior self-assigned this Feb 26, 2024

EclesioMeloJunior added the A-blocked issue or PR is blocked until something else changes. label Mar 11, 2024

EclesioMeloJunior mentioned this issue Mar 18, 2024

feat: wazero fork upgrade to use uint64 for memory.Size() #3800

Merged

EclesioMeloJunior closed this as completed in #3800 Mar 22, 2024

EclesioMeloJunior reopened this Apr 3, 2024

EclesioMeloJunior added this to the Westend synchronization milestone Apr 3, 2024

EclesioMeloJunior mentioned this issue Apr 4, 2024

fix(lib/runtime): allocator out of space and wasm out of bounds memory access #3865

Closed

EclesioMeloJunior removed the A-blocked issue or PR is blocked until something else changes. label Apr 8, 2024

jimjbrettj assigned jimjbrettj and unassigned jimjbrettj Apr 8, 2024

jimjbrettj mentioned this issue Apr 8, 2024

enhancement(wazero): increase sync speed via compilation caching #3880

Closed

EclesioMeloJunior closed this as completed Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(lib/runtime): fail to execute block `17088245` allocator out of space #3774

(lib/runtime): fail to execute block `17088245` allocator out of space #3774

EclesioMeloJunior commented Feb 26, 2024 •

edited

Loading

EclesioMeloJunior commented Mar 11, 2024

EclesioMeloJunior commented Mar 18, 2024 •

edited

Loading

EclesioMeloJunior commented Apr 3, 2024

EclesioMeloJunior commented Apr 5, 2024

EclesioMeloJunior commented May 13, 2024

EclesioMeloJunior commented Jun 26, 2024

(lib/runtime): fail to execute block 17088245 allocator out of space #3774

(lib/runtime): fail to execute block 17088245 allocator out of space #3774

Comments

EclesioMeloJunior commented Feb 26, 2024 • edited Loading

EclesioMeloJunior commented Mar 11, 2024

EclesioMeloJunior commented Mar 18, 2024 • edited Loading

EclesioMeloJunior commented Apr 3, 2024

EclesioMeloJunior commented Apr 5, 2024

EclesioMeloJunior commented May 13, 2024

EclesioMeloJunior commented Jun 26, 2024

(lib/runtime): fail to execute block `17088245` allocator out of space #3774

(lib/runtime): fail to execute block `17088245` allocator out of space #3774

EclesioMeloJunior commented Feb 26, 2024 •

edited

Loading

EclesioMeloJunior commented Mar 18, 2024 •

edited

Loading