feat: Implement background job to remove unused models#1098
feat: Implement background job to remove unused models#1098DerDreschner wants to merge 1 commit into
Conversation
Signed-off-by: David Dreschner <david.dreschner@nextcloud.com>
0bcaad8 to
8a30eae
Compare
| * Should match the number of models consumed by the statistics | ||
| * API. | ||
| */ | ||
| private const MODELS_TO_KEEP = 14; |
There was a problem hiding this comment.
Can we make the stats API and this job use the same constant to avoid them getting out of sync?
There was a problem hiding this comment.
Sure! Wasn't sure if a separation would be preferred 😄
| // Run once per day | ||
| $this->setInterval(24 * 60 * 60); | ||
| $this->setTimeSensitivity(self::TIME_INSENSITIVE); | ||
| $this->allowParallelRuns = false; |
There was a problem hiding this comment.
best use setAllowParallelRuns(false) here
There was a problem hiding this comment.
Pull request overview
This PR adds an automated cleanup mechanism to prevent the appdata/models directory and corresponding DB table from growing indefinitely by deleting older, unused persisted models on a daily schedule.
Changes:
- Added a daily timed background job to trigger model cleanup while keeping a fixed number of recent models per IP address type.
- Implemented
ModelStore::cleanup()to remove old model files, evict cache entries, and delete DB records. - Added
ModelMapper::findOld()and comprehensive unit tests covering cleanup behavior (file missing, folder missing, cache eviction, etc.).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
lib/BackgroundJob/CleanupModelsJob.php |
New timed job that runs daily and invokes model cleanup. |
lib/Service/ModelStore.php |
Adds cleanup logic to delete older persisted models from appdata, cache, and DB. |
lib/Db/ModelMapper.php |
Adds query helper to retrieve models older than the “keep” threshold per address type. |
tests/Unit/Service/ModelStoreTest.php |
Adds unit tests validating cleanup behavior and cache eviction. |
appinfo/info.xml |
Registers the new background job with the app. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| $query = $qb->select('*') | ||
| ->from($this->getTableName()) | ||
| ->where($qb->expr()->eq('address_type', $qb->createNamedParameter($addressType))) | ||
| ->setFirstResult($numberOfModelsToKeep) | ||
| ->orderBy('created_at', 'desc'); | ||
|
|
There was a problem hiding this comment.
findOld() orders only by created_at (an integer timestamp). If multiple models share the same created_at value (same second), the ordering becomes non-deterministic and the cleanup may delete a model that should have been kept. Add a secondary, deterministic sort key (e.g., id descending) to ensure the “most recent N” models are consistently preserved.
There was a problem hiding this comment.
As the model generation takes longer then a second, this shouldn't happen at all. But I'll add it anyway just to be on the safe side.
| public function cleanup(int $numberOfModelsToKeep): void { | ||
| $oldModels = array_merge( | ||
| $this->modelMapper->findOld($numberOfModelsToKeep, Ipv4Strategy::getTypeName()), | ||
| $this->modelMapper->findOld($numberOfModelsToKeep, IpV6Strategy::getTypeName()) | ||
| ); |
There was a problem hiding this comment.
cleanup() loads all old models for both address types into memory at once (findOld() without a limit, then array_merge). On long-running instances this can produce a very large array and make the daily background job slow or memory-heavy. Consider deleting in batches (e.g., fetch a limited page of old models/ids, delete, repeat) so runtime and memory stay bounded.
There was a problem hiding this comment.
True, but I assume that the number of entries is manageable? It's usually 2 models (IPv4/IPv6) per day... To get more then 10.000 entries in the database, you would need to have the app activated for over 13 years.
| foreach ($oldModels as $oldModel) { | ||
| $id = $oldModel->getId(); | ||
| $this->logger->debug("Cleaning up old model $id"); | ||
|
|
||
| // Remove the model file from app data | ||
| if (isset($modelsFolder)) { | ||
| try { | ||
| $modelFile = $modelsFolder->getFile((string)$id); | ||
| $modelFile->delete(); | ||
| } catch (NotFoundException) { | ||
| $this->logger->debug("Model file $id not found in app data, skipping file deletion"); | ||
| } | ||
| } | ||
|
|
||
| // Evict from cache | ||
| if (isset($cache)) { | ||
| $cache->remove($this->getCacheKey($id)); | ||
| } | ||
|
|
||
| // Remove the DB record | ||
| $this->modelMapper->delete($oldModel); | ||
| } | ||
|
|
||
| $numberOfOldModels = count($oldModels); | ||
| $this->logger->info("Cleaned up {$numberOfOldModels} old model(s)"); |
There was a problem hiding this comment.
cleanup() will abort the entire cleanup run if deleting a single model fails (e.g., file deletion throwing NotPermittedException or DB deletion throwing). For a maintenance/background task this makes cleanup brittle: one problematic record can prevent all subsequent old models from being removed. Consider wrapping the per-model cleanup body in a try/catch, logging the failure (with model id), and continuing with the next model (optionally still attempting DB deletion even if file deletion fails).
| foreach ($oldModels as $oldModel) { | |
| $id = $oldModel->getId(); | |
| $this->logger->debug("Cleaning up old model $id"); | |
| // Remove the model file from app data | |
| if (isset($modelsFolder)) { | |
| try { | |
| $modelFile = $modelsFolder->getFile((string)$id); | |
| $modelFile->delete(); | |
| } catch (NotFoundException) { | |
| $this->logger->debug("Model file $id not found in app data, skipping file deletion"); | |
| } | |
| } | |
| // Evict from cache | |
| if (isset($cache)) { | |
| $cache->remove($this->getCacheKey($id)); | |
| } | |
| // Remove the DB record | |
| $this->modelMapper->delete($oldModel); | |
| } | |
| $numberOfOldModels = count($oldModels); | |
| $this->logger->info("Cleaned up {$numberOfOldModels} old model(s)"); | |
| $numberOfCleanedModels = 0; | |
| foreach ($oldModels as $oldModel) { | |
| $id = $oldModel->getId(); | |
| try { | |
| $this->logger->debug("Cleaning up old model $id"); | |
| // Remove the model file from app data | |
| if (isset($modelsFolder)) { | |
| try { | |
| $modelFile = $modelsFolder->getFile((string)$id); | |
| $modelFile->delete(); | |
| } catch (NotFoundException) { | |
| $this->logger->debug("Model file $id not found in app data, skipping file deletion"); | |
| } catch (\Throwable $e) { | |
| $this->logger->warning("Failed to delete model file $id from app data: " . $e->getMessage(), [ | |
| 'exception' => $e, | |
| 'modelId' => $id, | |
| ]); | |
| } | |
| } | |
| // Evict from cache | |
| if (isset($cache)) { | |
| $cache->remove($this->getCacheKey($id)); | |
| } | |
| // Remove the DB record | |
| $this->modelMapper->delete($oldModel); | |
| $numberOfCleanedModels++; | |
| } catch (\Throwable $e) { | |
| $this->logger->warning("Failed to clean up old model $id: " . $e->getMessage(), [ | |
| 'exception' => $e, | |
| 'modelId' => $id, | |
| ]); | |
| } | |
| } | |
| $this->logger->info("Cleaned up {$numberOfCleanedModels} old model(s)"); |
| try { | ||
| $modelsFolder = $this->appData->getFolder(self::APPDATA_MODELS_FOLDER); | ||
| } catch (NotFoundException) { | ||
| $this->logger->debug('Models folder does not exist, skipping file deletion'); |
There was a problem hiding this comment.
Noo, in this case, the database has models we'll clean up to make both them consistentent to each other. The other way around (clean database, but model files in appdata) isn't supported right now. Not sure if that should be covered as well... 🤔 What do you think?
There was a problem hiding this comment.
Notice: Shouldn't be handled right now as it's an error case anyway.
At the moment, the generated models in the
appdatadirectory are not being removed at all. That leads to a pretty large directory when the app is enabled for a longer time (e.g., 200MB in my personal instance with 1 1/2 years of data).To solve this, I created a background job that runs once a day and only keep the minimum amount to use the statistics API in the database as well as the
appdatadirectory.AI-assisted: Claude Code (claude-opus-4-6) with several manual changes and reviews