Skip to content

Conversation

LightFTSO
Copy link

I recently had the indexer database disconnected for a few minutes, and after it went back online, the system-client resumed making normal requests to the clients, however, the fdc-client was failing silently when requesting submitSignatures, and this caused me a huge amount of penalizations on the reward epoch.

These are my logs:
On the system-client:

[04-12|09:41:46.001]	INFO	protocol/protocol_utils.go:147	Calling protocol 100 client API: http://ftso-client:3100/submit1/955787/0x3d71539F5814b3F77e084fE636378AA0B59A38c2
[04-12|09:41:46.001]	INFO	protocol/protocol_utils.go:147	Calling protocol 200 client API: http://fdc-client:8080/fsp/submit1/955787/0x3d71539F5814b3F77e084fE636378AA0B59A38c2
[04-12|09:41:52.191]	INFO	protocol/submitter.go:76	Submitter submit1 successfully sent tx
[04-12|09:42:05.001]	INFO	protocol/protocol_utils.go:147	Calling protocol 100 client API: http://ftso-client:3100/submit2/955787/0x3d71539F5814b3F77e084fE636378AA0B59A38c2
[04-12|09:42:05.001]	INFO	protocol/protocol_utils.go:147	Calling protocol 200 client API: http://fdc-client:8080/fsp/submit2/955787/0x3d71539F5814b3F77e084fE636378AA0B59A38c2
[04-12|09:42:07.015]	INFO	protocol/submitter.go:76	Submitter submit2 successfully sent tx
[04-12|09:42:45.000]	INFO	protocol/submitter.go:269	submitSignatures fetching data for 955787
[04-12|09:42:45.000]	INFO	protocol/protocol_utils.go:168	Calling protocol 200 client API: http://fdc-client:8080/fsp/submitSignatures/955787/0x17E69EeDB52bdc522B7498958323fCaC85d50473
[04-12|09:42:45.000]	INFO	protocol/protocol_utils.go:168	Calling protocol 100 client API: http://ftso-client:3100/submitSignatures/955787/0x17E69EeDB52bdc522B7498958323fCaC85d50473
[04-12|09:42:56.682]	INFO	finalizer/submission_processor.go:85	Threshold reached for protocol 100 in voting round 955787
[04-12|09:42:56.743]	INFO	finalizer/finalizer_queue.go:138	Finalizer with address 0x41800CcAE0B9aBa34F9f581739372f853b54067d will send outside grace period for voting round 955787 for protocol 100
[04-12|09:42:58.016]	INFO	protocol/submitter.go:76	Submitter submitSignatures successfully sent tx
[04-12|09:42:58.016]	INFO	protocol/protocol_utils.go:147	Calling protocol 200 client API: http://fdc-client:8080/fsp/submitSignatures/955787/0x17E69EeDB52bdc522B7498958323fCaC85d50473
[04-12|09:42:58.016]	WARN	protocol/submitter.go:407	Error getting data for submitter submitSignatures: max retries reached: retry
[04-12|09:43:00.017]	INFO	protocol/protocol_utils.go:147	Calling protocol 200 client API: http://fdc-client:8080/fsp/submitSignatures/955787/0x17E69EeDB52bdc522B7498958323fCaC85d50473
[04-12|09:43:00.017]	WARN	protocol/submitter.go:407	Error getting data for submitter submitSignatures: max retries reached: retry
[04-12|09:43:02.016]	INFO	protocol/protocol_utils.go:147	Calling protocol 200 client API: http://fdc-client:8080/fsp/submitSignatures/955787/0x17E69EeDB52bdc522B7498958323fCaC85d50473
[04-12|09:43:02.016]	WARN	protocol/submitter.go:407	Error getting data for submitter submitSignatures: max retries reached: retry
[04-12|09:43:05.001]	INFO	finalizer/finalizer_queue.go:218	Finalizer processes delayed queue item for round 955787 for protocol 100

On the fdc-client:

[04-12|09:39:29.127]	INFO	manager/manager.go:155	Round 955786 created
[04-12|09:39:40.000]	INFO	collector/choose.go:175	bitVote for round 955784 started with on-chain time
[04-12|09:39:40.001]	INFO	collector/choose.go:75	Received 72 bitVotes for round 955784
[04-12|09:40:35.001]	INFO	server/providerServices.go:30	submit2: for round 955786: 0xc8000e958a0003000203
[04-12|09:41:03.128]	INFO	manager/manager.go:155	Round 955787 created
[04-12|09:41:10.000]	INFO	collector/choose.go:175	bitVote for round 955785 started with on-chain time
[04-12|09:41:10.001]	INFO	collector/choose.go:75	Received 73 bitVotes for round 955785
[04-12|09:42:05.001]	INFO	server/providerServices.go:30	submit2: for round 955787: 0xc8000e958b0003000203
[04-12|09:42:29.127]	INFO	manager/manager.go:155	Round 955788 created
[04-12|09:42:40.000]	INFO	collector/choose.go:175	bitVote for round 955786 started with on-chain time
[04-12|09:42:40.001]	INFO	collector/choose.go:75	Received 72 bitVotes for round 955786
[04-12|09:43:35.000]	INFO	server/providerServices.go:30	submit2: for round 955788: 0xc8000e958c0003000203
[04-12|09:43:59.127]	INFO	manager/manager.go:155	Round 955789 created
[04-12|09:44:10.000]	INFO	collector/choose.go:175	bitVote for round 955787 started with on-chain time
[04-12|09:44:10.002]	INFO	collector/choose.go:75	Received 73 bitVotes for round 955787
[04-12|09:45:05.001]	INFO	server/providerServices.go:30	submit2: for round 955789: 0xc8000e958d0003000203
[04-12|09:45:29.128]	INFO	manager/manager.go:155	Round 955790 created
[04-12|09:45:40.000]	INFO	collector/choose.go:175	bitVote for round 955788 started with on-chain time
[04-12|09:45:40.001]	INFO	collector/choose.go:75	Received 72 bitVotes for round 955788
[04-12|09:46:35.001]	INFO	server/providerServices.go:30	submit2: for round 955790: 0xc8000e958e0003000203
[04-12|09:46:57.127]	INFO	manager/manager.go:155	Round 955791 created

As you can see, the system-client complains and retries to get the submitSignatures from the fdc-client, however, the fdc-client fails silently, no error log, nothing to see that something went wrong

This PR aims to at least make it visible when something goes wrong in the submitSignatures controller.
I implore you to take a deeper look

Thanks in advance!

@GrePod
Copy link
Contributor

GrePod commented Apr 18, 2025

Hi, thank you for the contribution.

Regarding the additional logs: in a normal scenario, FSP client always starts queries a bit before FDC client computes the Merkle root and queries until timeout or success. Your new logs would "spam" in a normal scenario, thus they are currently in debug level.

Regarding the logs you provided for round 955787. The timing is very strange:

  • The log Round 955787 created happened when your client first observed a request from that round. The request was submitted 13s into the round.

  • I believe you have set to send submit2 tx 5 seconds from the start of the choose phase. The tx is sent 62s after the round is crated. Normally, it should be after 82s after. From this viewpoint, your indexer is roughly 20s out of sync.

  • You receive bitVotes 125s after submit2 tx, normally it should be 41s after. From this viewpoint, your indexer is roughly 80s out of sync.

We will consider adding some additional checks of c-chain indexer to detect out of sync state and potentially act on it accordingly

@JamesGDiaz
Copy link

JamesGDiaz commented Apr 18, 2025

Understood, that makes sense, I guess the rebuilding of the new indexer database was indeed slowing things down then.

Additional checks for out of sync state on the indexer would be received with a lot of gratitude by everyone in the FTSO community (while you're at it, health check endpoints and prometheus metrics would be amazing too haha)
Thank you @GrePod , I will close this PR, happy Easter!

Edit: Guess I can't close it myself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants