Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up all temp files created during processing and storage #9

Merged
merged 3 commits into from
Sep 2, 2019

Conversation

achempion
Copy link
Member

stavro/arc#285

from @dmarkow

Fixes #256

Arc creates temp files in the directory that System.tmp_dir returns, but never cleans up after itself. Running unchecked on a production server could result in running out of disk space. These temp files are created in three scenarios:

  1. The input is binary data
  2. The input is a remote URL
  3. During transformations for each version

Scenarios 1 and 2 seem very similar (converting something that isn't a local file path into a local file path), however scenario 1 was happening once per transformation (where 10 versions/transformations meant the binary data would be written to disk 10 separate times), whereas scenario 2 was happening once right at the beginning (the URL is retrieved and saved to disk immediately during Arc.File.new/1).

Scenario 3 creates a temp file for each transformation.

Changes:

  • Write binary input to a tempfile one time during Arc.File.new/1 rather than once per transformation (the tradeoff is that even with no transformations, it'll still write once, but this matches how remote URLs are handled)
  • Got rid of Arc.File.ensure_path as it was only there to convert binary data to a path which now happens automatically
  • Add a tempfile? attribute to %Arc.File{} that can be used later for cleanup (I preferred this over just testing the path to see if it's in the temp dir, otherwise there's a chance of deleting the input file itself if it was also in the temp dir)
  • After storing each version, the version's temp file is deleted (scenario 3)
  • After storing all of the versions, the input temp file is deleted if it exists (scenarios 1 and 2)
  • Added tests for checking temp files (had to change some of the setup/teardown logic in the tests, but they all pass including the S3 tests)

@achempion achempion self-assigned this Aug 31, 2019
@achempion achempion marked this pull request as ready for review September 1, 2019 18:07
@achempion
Copy link
Member Author

@dmarkow @kolorahl what do you think?

@achempion achempion merged commit 7cc0a6a into master Sep 2, 2019
@achempion achempion deleted the feature/arc-pull-285 branch September 2, 2019 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants