Skip to content

feat: Implement background job to remove unused models#1098

Open
DerDreschner wants to merge 1 commit into
masterfrom
feat/implement-models-cleanup
Open

feat: Implement background job to remove unused models#1098
DerDreschner wants to merge 1 commit into
masterfrom
feat/implement-models-cleanup

Conversation

@DerDreschner
Copy link
Copy Markdown
Collaborator

At the moment, the generated models in the appdata directory are not being removed at all. That leads to a pretty large directory when the app is enabled for a longer time (e.g., 200MB in my personal instance with 1 1/2 years of data).

To solve this, I created a background job that runs once a day and only keep the minimum amount to use the statistics API in the database as well as the appdata directory.

AI-assisted: Claude Code (claude-opus-4-6) with several manual changes and reviews

Copilot AI review requested due to automatic review settings April 14, 2026 09:11
Signed-off-by: David Dreschner <david.dreschner@nextcloud.com>
@DerDreschner DerDreschner force-pushed the feat/implement-models-cleanup branch from 0bcaad8 to 8a30eae Compare April 14, 2026 09:14
* Should match the number of models consumed by the statistics
* API.
*/
private const MODELS_TO_KEEP = 14;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the stats API and this job use the same constant to avoid them getting out of sync?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Wasn't sure if a separation would be preferred 😄

// Run once per day
$this->setInterval(24 * 60 * 60);
$this->setTimeSensitivity(self::TIME_INSENSITIVE);
$this->allowParallelRuns = false;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best use setAllowParallelRuns(false) here

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an automated cleanup mechanism to prevent the appdata/models directory and corresponding DB table from growing indefinitely by deleting older, unused persisted models on a daily schedule.

Changes:

  • Added a daily timed background job to trigger model cleanup while keeping a fixed number of recent models per IP address type.
  • Implemented ModelStore::cleanup() to remove old model files, evict cache entries, and delete DB records.
  • Added ModelMapper::findOld() and comprehensive unit tests covering cleanup behavior (file missing, folder missing, cache eviction, etc.).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
lib/BackgroundJob/CleanupModelsJob.php New timed job that runs daily and invokes model cleanup.
lib/Service/ModelStore.php Adds cleanup logic to delete older persisted models from appdata, cache, and DB.
lib/Db/ModelMapper.php Adds query helper to retrieve models older than the “keep” threshold per address type.
tests/Unit/Service/ModelStoreTest.php Adds unit tests validating cleanup behavior and cache eviction.
appinfo/info.xml Registers the new background job with the app.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/Db/ModelMapper.php
Comment on lines +77 to +82
$query = $qb->select('*')
->from($this->getTableName())
->where($qb->expr()->eq('address_type', $qb->createNamedParameter($addressType)))
->setFirstResult($numberOfModelsToKeep)
->orderBy('created_at', 'desc');

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findOld() orders only by created_at (an integer timestamp). If multiple models share the same created_at value (same second), the ordering becomes non-deterministic and the cleanup may delete a model that should have been kept. Add a secondary, deterministic sort key (e.g., id descending) to ensure the “most recent N” models are consistently preserved.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the model generation takes longer then a second, this shouldn't happen at all. But I'll add it anyway just to be on the safe side.

Comment on lines +176 to +180
public function cleanup(int $numberOfModelsToKeep): void {
$oldModels = array_merge(
$this->modelMapper->findOld($numberOfModelsToKeep, Ipv4Strategy::getTypeName()),
$this->modelMapper->findOld($numberOfModelsToKeep, IpV6Strategy::getTypeName())
);
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup() loads all old models for both address types into memory at once (findOld() without a limit, then array_merge). On long-running instances this can produce a very large array and make the daily background job slow or memory-heavy. Consider deleting in batches (e.g., fetch a limited page of old models/ids, delete, repeat) so runtime and memory stay bounded.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I assume that the number of entries is manageable? It's usually 2 models (IPv4/IPv6) per day... To get more then 10.000 entries in the database, you would need to have the app activated for over 13 years.

Comment on lines +197 to +221
foreach ($oldModels as $oldModel) {
$id = $oldModel->getId();
$this->logger->debug("Cleaning up old model $id");

// Remove the model file from app data
if (isset($modelsFolder)) {
try {
$modelFile = $modelsFolder->getFile((string)$id);
$modelFile->delete();
} catch (NotFoundException) {
$this->logger->debug("Model file $id not found in app data, skipping file deletion");
}
}

// Evict from cache
if (isset($cache)) {
$cache->remove($this->getCacheKey($id));
}

// Remove the DB record
$this->modelMapper->delete($oldModel);
}

$numberOfOldModels = count($oldModels);
$this->logger->info("Cleaned up {$numberOfOldModels} old model(s)");
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup() will abort the entire cleanup run if deleting a single model fails (e.g., file deletion throwing NotPermittedException or DB deletion throwing). For a maintenance/background task this makes cleanup brittle: one problematic record can prevent all subsequent old models from being removed. Consider wrapping the per-model cleanup body in a try/catch, logging the failure (with model id), and continuing with the next model (optionally still attempting DB deletion even if file deletion fails).

Suggested change
foreach ($oldModels as $oldModel) {
$id = $oldModel->getId();
$this->logger->debug("Cleaning up old model $id");
// Remove the model file from app data
if (isset($modelsFolder)) {
try {
$modelFile = $modelsFolder->getFile((string)$id);
$modelFile->delete();
} catch (NotFoundException) {
$this->logger->debug("Model file $id not found in app data, skipping file deletion");
}
}
// Evict from cache
if (isset($cache)) {
$cache->remove($this->getCacheKey($id));
}
// Remove the DB record
$this->modelMapper->delete($oldModel);
}
$numberOfOldModels = count($oldModels);
$this->logger->info("Cleaned up {$numberOfOldModels} old model(s)");
$numberOfCleanedModels = 0;
foreach ($oldModels as $oldModel) {
$id = $oldModel->getId();
try {
$this->logger->debug("Cleaning up old model $id");
// Remove the model file from app data
if (isset($modelsFolder)) {
try {
$modelFile = $modelsFolder->getFile((string)$id);
$modelFile->delete();
} catch (NotFoundException) {
$this->logger->debug("Model file $id not found in app data, skipping file deletion");
} catch (\Throwable $e) {
$this->logger->warning("Failed to delete model file $id from app data: " . $e->getMessage(), [
'exception' => $e,
'modelId' => $id,
]);
}
}
// Evict from cache
if (isset($cache)) {
$cache->remove($this->getCacheKey($id));
}
// Remove the DB record
$this->modelMapper->delete($oldModel);
$numberOfCleanedModels++;
} catch (\Throwable $e) {
$this->logger->warning("Failed to clean up old model $id: " . $e->getMessage(), [
'exception' => $e,
'modelId' => $id,
]);
}
}
$this->logger->info("Cleaned up {$numberOfCleanedModels} old model(s)");

Copilot uses AI. Check for mistakes.
Comment thread lib/BackgroundJob/CleanupModelsJob.php
Comment thread lib/BackgroundJob/CleanupModelsJob.php
try {
$modelsFolder = $this->appData->getFolder(self::APPDATA_MODELS_FOLDER);
} catch (NotFoundException) {
$this->logger->debug('Models folder does not exist, skipping file deletion');
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return missing?

Copy link
Copy Markdown
Collaborator Author

@DerDreschner DerDreschner Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noo, in this case, the database has models we'll clean up to make both them consistentent to each other. The other way around (clean database, but model files in appdata) isn't supported right now. Not sure if that should be covered as well... 🤔 What do you think?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice: Shouldn't be handled right now as it's an error case anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants