Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving options to S3/Azureblob in a --output-dir flag #47

Open
sigridjineth opened this issue Aug 26, 2023 · 4 comments
Open

Saving options to S3/Azureblob in a --output-dir flag #47

sigridjineth opened this issue Aug 26, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@sigridjineth
Copy link

sigridjineth commented Aug 26, 2023

Is your feature request related to a problem? Please describe.
While cryo supports various data formats, there's no direct option to save data to cloud storage solutions like S3 or Azureblob or other custom HDFS clusters.

Describe the solution you'd like
Introduce a direct saving option for S3 and Azureblob or others, allowing users to effortlessly store the extracted blockchain data on these platforms. The command flag could be something like --output-dir s3 or --output-dir azureblob.

Describe alternatives you've considered
Users can currently save data locally and then manually upload it to their preferred cloud storage. However, an integrated solution within cryo would streamline the process and save time.

Tasks

No tasks being tracked yet.
@sigridjineth sigridjineth changed the title Saving options to S3/Azureblob... Saving options to S3/Azureblob in a --output-dir flag Aug 26, 2023
@kskalski
Copy link
Contributor

If we reuse the --output-dir, which I guess should then point to the provider-specific uri prefix, then we need to decide where to put the intermediate files (e.g. /tmp/subdir) and what to do with them on successful upload (delete) and upload error (delete?)

@sslivkoff
Copy link
Member

I think this would be a nice feature to have

I think this is right on the border between "this is complicated enough to belong in a separate tool" and "this would be incredibly convenient so is worth it". I think it's worth it as long as we can get a really robust implementation without too much complexity

how about this as an interface:

  • specify upload location as --upload s3://...
  • if --upload is present and --output-dir is not present, write files to tmp dir before upload, delete files after upload
  • if --upload is present and --output-dir is present, write files to output dir and do not delete them after upload

I imagine calls to the upload function would belong here in the code

@sigridjineth
Copy link
Author

nice, would start work this after sbc week ends. thanks for your direction @sslivkoff

@sslivkoff
Copy link
Member

instead of implementing the upload logic in cryo, it should just wrap rclone https://rclone.org/

there are so many edge cases and failure modes and configuration/authentication details that it's good to just lean on rclone's amazing implementation

doing cryo ... --upload s3://bucket/path/ should just mean that cryo proceeds as normal and then afterwards it should internally run rclone $output_dir s3://bucket/path/

@sslivkoff sslivkoff added the enhancement New feature or request label Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants