Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Apache Feather for API object transfer instead of RDS #328

Open
wants to merge 15 commits into
base: v2.3.1
Choose a base branch
from

Conversation

PietrH
Copy link
Member

@PietrH PietrH commented Oct 17, 2024

This PR requires changes on the Lifewatch RStudio Server in order to work on there, I want users on the RStudio Server to be able to use the most recent etn version. If this PR is merged into v2.3.1, then this version will require an update of the R version on the lifewatch server.

This PR is serving as a jumping off point for testing by my co-developers. I'd be very grateful for your input!

Fixes:

  • I now use apache arrow for file transfers instead of RDS, which uses lz4 instead of gzip compression and is also chunked:
    • less memory usage for both client and server
    • faster compression/decompression
    • because we need less memory, we can get away with larger objects

Talking points:

  • Doesn't actually allow you to load detections for 2013_albertkanaal this crashes before serialisation on the server side, so I can't fix this on the client side
  • Is it actually faster for you?
  • You can test memory usage by looking at your resource manager, or via something like bench::mark()
  • Is the arrow dependency worth it? RDS will become more and more problematic with larger and larger datasets (especially multiple animal_project_codes)

Alternative approach

The API result object is currently passed as a single binary stream. Instead, I could also try to split it up into multiple files hosted by OpenCPU and to fetch those individually, and combine them. This is more complex and will be more difficult to debug if something goes wrong, but would allow us to keep using rds for now. I haven't tested this yet, as it would disrupt the current v2.3.0 beta. There is a dev env on the horizon that would allow tests like this in the future.

@PietrH PietrH linked an issue Oct 17, 2024 that may be closed by this pull request
6 tasks
@PietrH PietrH self-assigned this Oct 17, 2024
@PietrH PietrH changed the base branch from main to v2.3.1 October 17, 2024 11:13
@PietrH PietrH requested a review from peterdesmet October 17, 2024 11:37
@PietrH PietrH assigned sannegovaert and unassigned sannegovaert Oct 17, 2024
@PietrH PietrH requested a review from sannegovaert October 17, 2024 11:38
@PietrH
Copy link
Member Author

PietrH commented Oct 17, 2024

@sannegovaert , @peterdesmet , Could you have a look at this when you have time? You'll have to try it locally as the RStudio Server doesn't support R4.0 or Apache Arrow at the moment.

@PietrH
Copy link
Member Author

PietrH commented Oct 17, 2024

#329 serves as an anchor for a version upgrade of the R version on the Lifewatch RStudio Server. This will open up many more recent R package versions to the users on the server. This will become more and more relevant as tidyverse drops support for R versions lower as 4.

@PietrH
Copy link
Member Author

PietrH commented Oct 22, 2024

Blocked by #329

@PietrH PietrH marked this pull request as ready for review October 22, 2024 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: change compressed output format for get_val to reduce serialisation cost
2 participants