Skip to content

4454 - Added deletion to bulk import #4629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 16, 2025

Conversation

Rob9786
Copy link
Collaborator

@Rob9786 Rob9786 commented Apr 10, 2025

Make sure you have checked all steps below.

Issue

Tests

  • My PR adds the following tests OR does not need testing for this extremely good reason:
    • shouldDeleteJsonFileAfterImport

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it, or I have linked to a
    separate issue for that below.
  • If I have added or removed any dependencies from the project, I have updated the NOTICES file.

@Rob9786 Rob9786 linked an issue Apr 10, 2025 that may be closed by this pull request
@Rob9786 Rob9786 changed the title Added deletion to bulk import Draft: 4454 - Added deletion to bulk import Apr 11, 2025
@Rob9786
Copy link
Collaborator Author

Rob9786 commented Apr 15, 2025

Hi Tristan,

When the IT test runs, it triggers three import drivers, BulkImportJobDataframeDriver, BulkImportJobRDDDriver and BulkImportDataframeLocalSortDriver.

Each of these have a main method that call BulkImportDriver.start passing in args.

BulkImportDriver.start then extracts the configBucket from the args and loads the InstanceProperties from that bucket. The InstanceProperties contain the location of the BULK_IMPORT_BUCKET, which is used in loadJob() to create the json file that needs to then be deleted.

In the Delete test I need to check the contents of BULK_IMPORT_BUCKET to make sure the json file has been deleted. I'm trying to find out what the value of BULK_IMPORT_BUCKET needs to be.

@Rob9786 Rob9786 force-pushed the 4454-bulk-import-jobs-are-never-deleted-from-s3 branch from 908aab8 to b39f47d Compare April 15, 2025 13:37
@Rob9786 Rob9786 marked this pull request as ready for review April 15, 2025 13:37
@Rob9786 Rob9786 changed the title Draft: 4454 - Added deletion to bulk import 4454 - Added deletion to bulk import Apr 15, 2025
@rtjd6554 rtjd6554 self-assigned this Apr 15, 2025
@Rob9786 Rob9786 merged commit 4ff2a39 into develop Apr 16, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bulk import jobs are never deleted from S3
2 participants