Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign the indexing process to sync bitcoin faster #180

Open
AlexITC opened this issue Aug 29, 2020 · 0 comments
Open

Redesign the indexing process to sync bitcoin faster #180

AlexITC opened this issue Aug 29, 2020 · 0 comments
Labels
help wanted Extra attention is needed roadmap A feature that will be developed server Changes required on the server project

Comments

@AlexITC
Copy link
Collaborator

AlexITC commented Aug 29, 2020

We need to optimize the blockchain syncing process.

Expected behavior

Indexing Bitcoin should take at most 1 week.

Actual behavior

Indexing Bitcoin takes months!

Steps to reproduce the behavior

Just mount a full Bitcoin node, and follow the steps to sync the explorer with it.

Notes

This is a very complex task, which involves work from the infra side to the backend work.

On the infra side, we need to use a load balancer for the bitcoind RPC API, based on previous experiences, the minimum requirements are:

  • Each node has 8 CPUs, and 8GB on ram, and SSD.
  • 3 bitcoind instances.
  • The necessary config on bitcoind should be tweaked to accept lots of concurrent calls.

On the explorer side:

  • A huge server with lots of CPUs (potentially 32/64 at least).
  • The postgres instance should be tweaked accordingly, it's still unknown what's the ideal server capacity, but should handle 2TB of data properly.

On the approach to take, the syncing process should be done in several stages (looks like a good candidate for akka-streams):

  • Block headers (mandatory before any other stage).
  • Transaction headers.
  • Transaction outputs (depends on transaction headers).
  • Transaction inputs (depends on the outputs).
  • Block filter (depends on the outputs).
  • TPoS contracts (depends on the outputs)
  • Block rewards (depends on the outputs, potentially could be synced after the block headers).
  • Address balances (depends on the inputs)
  • Address transaction details (depends on the inputs)
    keeping 3 nodes at minimum, with 8GB on ram or more, and

As we don't require the whole data to be indexed, ideally we should be able to disable some stages to speed up the process, and save space, these are good candidates (sql tables):

  • balances.
  • tpos_contracts.
  • block_rewards
  • address_transaction_details

All of this would affect the exposed API, because we shouldn't return blocks that aren't fully synced, also, it's important to consider potential rollbacks while syncing the data.

@AlexITC AlexITC added help wanted Extra attention is needed roadmap A feature that will be developed server Changes required on the server project labels Aug 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed roadmap A feature that will be developed server Changes required on the server project
Projects
None yet
Development

No branches or pull requests

1 participant