Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Storing and Accessing Large Data (>12-13 Bytes) via Public Gateway #164

Open
Ali-Usama opened this issue Mar 15, 2024 · 4 comments
Assignees
Labels
bug Something isn't working p:normal Normal Priority question Further information is requested

Comments

@Ali-Usama
Copy link

Description

I'm integrating rust-ipfs into a Substrate blockchain to enable decentralized storage capabilities for our nodes. The integration involves using offchain workers to interact with an IPFS node, managed by rust-ipfs, for storing and retrieving data. While testing this setup, I've encountered an issue where I'm unable to access data larger than approximately 12 to 13 bytes through a public IPFS gateway. Smaller data sizes work as expected and are accessible without issues.

Steps to Reproduce

  • Initialize the IPFS node using rust-ipfs with this configuration.
  • Store data on IPFS using the Substrate offchain worker, which interacts with the rust-ipfs instance.
  • Attempt to access the stored data through a public IPFS gateway (e.g., https://ipfs.io/ipfs/).

Expected Behavior

Data of any size, when stored on IPFS using rust-ipfs through our Substrate blockchain integration, should be retrievable via public IPFS gateways.

Actual Behavior

When attempting to access data larger than 12 to 13 bytes through a public gateway, the request fails (504: Gateway Timeout Error). Smaller data sizes are retrievable without any issues.

Additional Information

Rust-IPFS version: forked rust-ipfs
Substrate version: polkadot-v0.9.43

I suspect this might be related to how rust-ipfs handles data chunking or broadcasting of CID announcements to the IPFS network, particularly for larger data sizes. However, I am not entirely sure if the issue lies within the configuration of the rust-ipfs node, the data storage process, or the retrieval/query mechanism.

Request for Assistance

Could you provide insights or recommendations on how to address this issue? Specifically, I am looking for:

  • Confirmation if this is a known issue with rust-ipfs or if it might be related to my integration approach.
  • Any configuration changes or optimizations that could help in successfully storing and accessing larger data sizes via public IPFS gateways.
  • Best practices for debugging and resolving such issues when integrating rust-ipfs with Substrate blockchains.

Thank you for your support and looking forward to your guidance on resolving this challenge.

@Ali-Usama Ali-Usama added the bug Something isn't working label Mar 15, 2024
@dariusc93
Copy link
Owner

Hey! Thank you for the report. I never done much testing with rust-ipfs and public gateways lately (since that havent been a priority for me at the moment), but the last I did test I do know it sometimes falls down to connectivity, if the content is being provided on DHT as well as the bitswap implementation (which under your fork uses beetle-bitswap by default, which should work better when dealing with gateways). I didnt have time to do a full review of the code youre using (can do that later on today), but from a quick skim there are some things I can suggestions to see if it helps any:

  1. Use the latest of rust-ipfs (though 0.11 havent been published yet, I have done some optimizations and updates, though in your case I would suggest using beetle-bitswap feature)
  2. Check your firewall to make sure it does not block upnp and that your machine and network equipment supports it; or connect and listen in on a public relay so your local node can be dialable.
  3. Though the bitswap implementation used will send an event to provide the cid over DHT, you can also manually provide those blocks too.
  4. On https://github.com/Ali-Usama/substrate/blob/polkadot-v0.9.43/client/offchain/src/api/ipfs.rs#L31C9-L31C73, I would advise decreasing this amount below 2MB (preferable to be 1MB or leave it at a default of 256k). This is because bitswap specs calls for messages not to exceed 2MB while ipfs suggest blocks not to be no more than 1MB, so if the block exceeds 2MB (including the size of the protobuf message) it may fail and no blocks would be exchanged.

@dariusc93 dariusc93 added the question Further information is requested label Mar 16, 2024
@Ali-Usama
Copy link
Author

I've updated the node configurations here using the bitswap feature, but still I'm facing the same issue. After adding a boot node, this still returns 0 peers:

let peers = if let IpfsResponse::Peers(peers) = ipfs_request::<T>(IpfsRequest::Peers)? {
                peers
            } else {
                Vec::new()
            };

So, I'm assuming the issue might be with how the node connects to the IPFS network, but it still doesn't explain why the small datasets are accessible on the public gateways, and as soon as the data size crosses a certain threshold, it becomes inaccessible.

@dariusc93
Copy link
Owner

Thank you for your response.

After adding a boot node, this still returns 0 peers

Could you add the other bootstrap nodes and maybe try calling Ipfs::bootstrap after initializing and see if that helps? We dont do this automatically (although the latest rust-libp2p version will likely do this automatically) so all it would do is add to the peer kbucket and connect but would not begin bootstrapping.

So, I'm assuming the issue might be with how the node connects to the IPFS network, but it still doesn't explain why the small datasets are accessible on the public gateways, and as soon as the data size crosses a certain threshold, it becomes inaccessible.

I do find it interesting that is a problem after a specific amount of data. Would it also be an issue If you were to run a local gateway and have your instance connect to that gateway instead? Are you connecting over any relays and if so, does it use dcutr properly? (you would likely have to look at the logs for this AND this is assuming that upnp isnt working or isnt a option in your environment - best to check firewall and network equipment in that case to be sure). If it doesnt, the small amount of data might make sense because relay v2 defaults to about 128k of data before the connection resets since it expects dcutr to kick in by that time if both peers support the protocol and there isnt any issues preventing usage of that protocol.

@dariusc93
Copy link
Owner

Did a little testing and there is only some instances where I've noticed that there arent as many responses to a gateway. but not specific to any specific amount of data.

@dariusc93 dariusc93 self-assigned this Jun 4, 2024
@dariusc93 dariusc93 added the p:normal Normal Priority label Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working p:normal Normal Priority question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants