Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async Deletion of Previous Metadata and Statistics Files #312

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

danielhumanmod
Copy link

@danielhumanmod danielhumanmod commented Sep 22, 2024

Description

Currently, when handling DROP TABLE PURGE, Polaris only deletes data files and the current metadata in TableCleanupTaskHandler. However, other related files, such as historical metadata.json files, stat files, and others, are not deleted.

Iceberg has addressed this issue (see Iceberg PR #9305), but Polaris doesn’t adopt this solution due to factors like asynchronous deletion. This PR introduces additional deletions for the following files:

  • Previous metadata files
  • Statistic files

Fixes #289

Included Changes

  • Updated TableCleanupTaskHandler.java to schedule previous metadata and statistics deletion in batch
  • Extended ManifestFileCleanupTaskHandler to delete metadata and statistics file batch
  • Extended TaskTestUtils.java to write metadata with previous metadata and statistics files
  • Add unit tests in TableCleanupTaskHandlerTest.java and TableContentCleanupTaskHandlerTest

Recommended Review Order:
TableCleanupTaskHandler.java -> ManifestFileCleanupTaskHandler.java -> TaskTestUtils.java -> TableCleanupTaskHandlerTest.java and ManifestFileCleanupTaskHandlerTest

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Documentation update
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Unit Test:

  • TableCleanupTaskHandlerTest
  • ManifestFileCleanupTaskHandlerTest

Checklist:

Please delete options that are not relevant.

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • If adding new functionality, I have discussed my implementation with the community using the linked GitHub issue

Comment on lines 170 to 172
tableMetadata.snapshots().stream()
.map(Snapshot::manifestListLocation)
.forEach(fileIO::deleteFile);
Copy link
Contributor

@flyrain flyrain Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've already handled the manifest files and manifest list files.

Copy link
Author

@danielhumanmod danielhumanmod Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've already handled the manifest files and manifest list files.

Thank you for bringing that up! I’ve just noticed that these are already handled in the ManifestFileCleanupTaskHandler. Have removed theses redundant deletion in the latest commit. Appreciate your input.

@danielhumanmod danielhumanmod marked this pull request as ready for review September 24, 2024 06:03
@danielhumanmod danielhumanmod changed the title Delete all related files in TableCleanUpTask Delete prev metadata and stat files in TableCleanUpTask Sep 24, 2024
Comment on lines 163 to 168
tableMetadata.previousFiles().stream()
.map(TableMetadata.MetadataLogEntry::file)
.forEach(fileIO::deleteFile);
tableMetadata.statisticsFiles().stream()
.map(StatisticsFile::path)
.forEach(fileIO::deleteFile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should either add retries for these or submit them as separate tasks. As is, if one of these files fails to delete, we'll retry the whole task and resubmit a new task for each manifest. If the manifests are already deleted when we retry, we'll get stuck in a retry loop.

Copy link
Author

@danielhumanmod danielhumanmod Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, Michael! I could refactor the deletion process by adding retry strategy, similar to the tryDelete in ManifestFileCleanupTaskHandler.

Personally, I think using a retry mechanism here might be more effective than creating separate tasks. With separate tasks,, there’s still a risk of issues like task creation failures, which could result in skipping the entire task (which contains multiple files). By using retries within the ManifestFileCleanupTaskHandler, we can manage failure handling at the file level, ensuring that each file is retried independently. This way, if a file deletion fails, we can retry just that file without needing to resubmit or skip the others files. This approach can offers more granular control over handling failures.

I’m open to your thoughts on this! Does this seem aligned with what you were suggesting, or do you see potential advantages in the separate task approach that I might be overlooking?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, do you mean adding retry within the TableCleanupTaskHandler.java? If so, that's fine with me. I'd at least use an Executor to attempt the execution in parallel (i.e., so file2 isn't blocked during the delay between retries for file1).

TBH, I'm not very familiar with how Iceberg generates the statistic files. Does there tend to be one per snapshot? one per data file? If the latter, we could be talking about a very large number of files. If that's the case, I think submitting separate tasks with batches of files makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there only one statistics file per snapshot? The spec is not clear:

A table can contain many statistics files associated with different table snapshots.

Unlike the partition statistics file, which is very clear:

Each table snapshot may be associated with at most one partition statistics file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are per snapshots, https://iceberg.apache.org/spec/#table-statistics.

Could be multiple statistics files as the following code shows:

https://github.com/apache/iceberg/blob/5439cbdb278232779fdd9a392bbf57f007f9bda0/core/src/main/java/org/apache/iceberg/TableMetadata.java#L535-L535

Thank you for providing the context, @flyrain :)

Sorry, do you mean adding retry within the TableCleanupTaskHandler.java? If so, that's fine with me. I'd at least use an Executor to attempt the execution in parallel (i.e., so file2 isn't blocked during the delay between retries for file1).

TBH, I'm not very familiar with how Iceberg generates the statistic files. Does there tend to be one per snapshot? one per data file? If the latter, we could be talking about a very large number of files. If that's the case, I think submitting separate tasks with batches of files makes sense.

@collado-mike Yes, if we go for retries, the logic will be within TableCleanupTaskHandler.java. But considering the information provided by Yufei, maybe separate tasks will be a more appropriate approach for stats file deletion?

LOGGER.error("Unable to delete data files from manifest {}", manifestFile.path(), e);
return false;
}
private boolean cleanUpManifestFile(
Copy link
Author

@danielhumanmod danielhumanmod Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the lots of changes here, but don’t worry—it’s mainly because I refactored the deletion logic for the manifest and all its data into a new method; no other changes were made in lines 91-135.

TableIdentifier tableId = cleanupTask.getTableId();
try (FileIO authorizedFileIO = fileIOSupplier.apply(task)) {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combine manifest and content file cleanup in the same task, but ensure that each task processes only one type of data (either manifest or content files) at a time.

* files in a manifest and the manifest itself. Since data files may be present in multiple
* manifests across different snapshots, we assume a data file that doesn't exist is missing because
* it was already deleted by another task. 2. Table content files: It contains previous metadata and
* statistics files, which are grouped and deleted in batch
*/
public class ManifestFileCleanupTaskHandler implements TaskHandler {
Copy link
Author

@danielhumanmod danielhumanmod Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming this task will triger lots of relevent changes. If a rename is needed, we may want to handle it in a separate PR to avoid too much changes (Leave a TODO here)

@@ -226,4 +233,228 @@ public void deleteFile(String location) {
assertThatPredicate((String f) -> TaskUtils.exists(f, fileIO)).rejects(dataFile2Path);
}
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test cases in this file primarily verify theses cleanup:

  1. straightforward cleanup
  2. cleanup when files do not exist
  3. cleanup with retries

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test cases in this file primarily verify theses scenarios:

  1. Cleanup of a single metadata file with a single snapshot
  2. Cleanup of a single metadata file across multiple snapshots
  3. Cleanup of multiple metadata files across multiple snapshots (New)

@@ -76,7 +79,13 @@ public void testTableCleanup() throws IOException {
TestSnapshot snapshot =
TaskTestUtils.newSnapshot(fileIO, "manifestList.avro", 1, snapshotId, 99L, manifestFile);
String metadataFile = "v1-49494949.metadata.json";
TaskTestUtils.writeTableMetadata(fileIO, metadataFile, snapshot);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Cleanup of a single metadata file with a single snapshot

@@ -303,7 +323,20 @@ public void testTableCleanupMultipleSnapshots() throws IOException {
manifestFile1,
manifestFile3); // exclude manifest2 from the new snapshot
String metadataFile = "v1-295495059.metadata.json";
TaskTestUtils.writeTableMetadata(fileIO, metadataFile, snapshot, snapshot2);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Cleanup of a single metadata file across multiple snapshots

}
}

@Test
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Cleanup of multiple metadata files across multiple snapshots (New)

Copy link
Author

@danielhumanmod danielhumanmod Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file primarily attach previous metadata and statistics filewhen writing metadata

@danielhumanmod
Copy link
Author

Thanks @danielhumanmod for working on it! left some coments.

Thanks for the suggestions @flyrain , I've updated the code based on your comments and added some comment to help the code easier to understand. I’d appreciate any further feedback, and thanks for your time and patience!

@flyrain flyrain requested a review from adutra as a code owner November 9, 2024 05:45
Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, @danielhumanmod. Left some comments.

fileIO.deleteFile(tableEntity.getMetadataLocation());

return true;
}
}
return false;
}

private List<List<String>> getContentFileBatch(TableMetadata tableMetadata, FileIO fileIO) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name contentFile in Iceberg refers to a data file or a delete file. Let's pick another name to avoid confusion, as metadata files and statistic files are not content files. How about metadataFile?


private List<List<String>> getContentFileBatch(TableMetadata tableMetadata, FileIO fileIO) {
List<List<String>> result = new ArrayList<>();
List<String> contentFiles =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename here

Stream.concat(
tableMetadata.previousFiles().stream().map(TableMetadata.MetadataLogEntry::file),
tableMetadata.statisticsFiles().stream().map(StatisticsFile::path))
.filter(file -> TaskUtils.exists(file, fileIO))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check the file existence here? It will be checked at delete time, right?

.toList();
});

Stream<TaskEntity> contentFileCleanupTasks =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> metadataFileCleanupTask

@@ -102,7 +106,7 @@ public boolean handleTask(TaskEntity cleanupTask) {
// read the manifest list for each snapshot. dedupe the manifest files and schedule a
// cleanupTask
// for each manifest file and its data files to be deleted
List<TaskEntity> taskEntities =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is quite long. Can we split it to multiple methods like this?

 var manifestFileCleanupTask = getTaskStream(...);
 var metadataFileCleanupTasks = getTaskEntityStream(cleanupTask, tableMetadata, fileIO, tableEntity, metaStoreManager, polarisCallContext);

var taskEntities = Stream.concat(manifestCleanupTasks, contentFileCleanupTasks).toList();

@@ -185,12 +236,18 @@ private CompletableFuture<Void> tryDelete(
public static final class ManifestCleanupTask {
private TableIdentifier tableId;
private String manifestFileData;
private List<String> contentFileBatch;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename here

Comment on lines 142 to 147
LOGGER
.atWarn()
.addKeyValue("contentFileBatch", contentFileBatch.toString())
.addKeyValue("tableId", tableId)
.log("Table content cleanup task scheduled, but the none of the file in batch exists");
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to warn here if 1 out of 10 files doesn't exist?

Comment on lines 167 to 171
LOGGER
.atInfo()
.addKeyValue("contentFileBatch", contentFileBatch.toString())
.addKeyValue("tableId", tableId)
.log("Content file batch deletion has completed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can either remove it or make it debug level. An info level log may not be necessary.

Comment on lines 160 to 164
LOGGER
.atWarn()
.addKeyValue("contentFileBatch", contentFileBatch.toString())
.addKeyValue("tableId", tableId)
.log("Exception detected during content file batch deletion", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we throw here?

* files in a manifest and the manifest itself. Since data files may be present in multiple
* manifests across different snapshots, we assume a data file that doesn't exist is missing because
* it was already deleted by another task. 2. Table content files: It contains previous metadata and
* statistics files, which are grouped and deleted in batch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to handle the same for partition statistics files also (in this PR or as a follow up)
more details: apache/iceberg#9409

Copy link
Author

@danielhumanmod danielhumanmod Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to handle the same for partition statistics files also (in this PR or as a follow up) more details: apache/iceberg#9409

Thanks for your suggestion! Would it be okay to address this in a separate PR avoid making this one too large? (Left a comment on the code, I can create an issue to track it if necessary)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me.

- 1. renaming
- 2. add log and exception handling
- 3. remove unnecessary log
- 1. renaming
- 2. extract task entities creation into methods
- 3. remove unnecessary filtering
fileIO.deleteFile(tableEntity.getMetadataLocation());

return true;
}
}
return false;
}

Copy link
Author

@danielhumanmod danielhumanmod Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract previous manifest task creation into a new method, no new change added for line 152 - 201

.build();
});
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code blow this line is the new changes

@danielhumanmod
Copy link
Author

Thanks for your review and suggestions @flyrain @ajantha-bhat ! I've made updates to the code based on your comments and would greatly appreciate any additional feedback. I understand this PR is a bit long—thank you again for your patience and time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DROP TABLE with PURGE does not delete metadata.json files
4 participants