Skip to content

fix: clean orphan segments during dataset deletion#37773

Open
xiaobao-k8s wants to merge 1 commit into
langgenius:mainfrom
xiaobao-k8s:fix/issue-37066-knowledge-storage
Open

fix: clean orphan segments during dataset deletion#37773
xiaobao-k8s wants to merge 1 commit into
langgenius:mainfrom
xiaobao-k8s:fix/issue-37066-knowledge-storage

Conversation

@xiaobao-k8s

Copy link
Copy Markdown

Summary

Fixes #37066.

When document deletion and dataset deletion race in separate async tasks, the document row can be gone before clean_dataset_task runs while dataset-scoped document_segments still remain. clean_dataset_task previously skipped segment cleanup when documents was empty, leaving orphan segments behind. This patch always cleans dataset-scoped segments, even if the document rows are already gone.

Changes

  • Move DocumentSegment cleanup outside the documents existence branch in clean_dataset_task.
  • Add a regression unit test for orphan segment cleanup when no document rows are found.

Test Plan

cd api
uv run pytest -o addopts='' tests/unit_tests/tasks/test_clean_dataset_task.py::TestEdgeCases::test_clean_dataset_task_deletes_orphan_segments_without_documents -q -s
uv run pytest -o addopts='' tests/unit_tests/tasks/test_clean_dataset_task.py -q
cd ..
git diff --check
cd api && uv run ruff check tasks/clean_dataset_task.py tests/unit_tests/tasks/test_clean_dataset_task.py

Real behavior proof

behavior: Dataset deletion cleanup now deletes dataset-scoped document_segments even when the corresponding documents rows have already been deleted by a racing async document-delete task.
environment: Repository langgenius/dify, worktree /datad/github/dify-issue37066, branch fix/issue-37066-knowledge-storage, commit 870bc5f89, Python 3.12 via uv.
steps: Ran the targeted regression test and full clean_dataset_task unit test file after the patch:

cd /datad/github/dify-issue37066/api
uv run pytest -o addopts='' tests/unit_tests/tasks/test_clean_dataset_task.py::TestEdgeCases::test_clean_dataset_task_deletes_orphan_segments_without_documents -q -s
uv run pytest -o addopts='' tests/unit_tests/tasks/test_clean_dataset_task.py -q

evidence: Targeted test output:

.
1 passed, 2 warnings in 1.83s

Full file output:

........                                                                 [100%]
8 passed, 2 warnings in 2.07s

observedResult: The new test constructs a dataset cleanup run with zero documents but one remaining segment, and verifies that DELETE FROM document_segments is issued. Existing clean_dataset_task unit tests still pass.
notTested: Full Dify backend test suite and live Dify Cloud billing/quota recalculation were not run locally. Pytest was invoked with -o addopts='' because this local environment lacked pytest coverage plugin support for the repository-level pytest.ini coverage addopts.

@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jun 22, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-06-22 23:47:32.749503746 +0000
+++ /tmp/pyrefly_pr.txt	2026-06-22 23:47:18.823475241 +0000
@@ -3241,11 +3241,11 @@
 ERROR Argument `dict[str, dict[str, str]]` is not assignable to parameter `override_config_dict` with type `AppModelConfigDict | None` in function `core.app.apps.agent_chat.app_config_manager.AgentChatAppConfigManager.get_app_config` [bad-argument-type]
   --> tests/unit_tests/core/app/apps/agent_chat/test_agent_chat_app_config_manager.py:41:34
 ERROR Object of class `FunctionType` has no attribute `assert_called_once` [missing-attribute]
-  --> tests/unit_tests/core/app/apps/agent_chat/test_agent_chat_app_runner.py:59:9
+  --> tests/unit_tests/core/app/apps/agent_chat/test_agent_chat_app_runner.py:47:9
 ERROR Object of class `FunctionType` has no attribute `assert_called_once` [missing-attribute]
-  --> tests/unit_tests/core/app/apps/agent_chat/test_agent_chat_app_runner.py:88:9
+  --> tests/unit_tests/core/app/apps/agent_chat/test_agent_chat_app_runner.py:76:9
 ERROR Object of class `FunctionType` has no attribute `assert_called_once` [missing-attribute]
-   --> tests/unit_tests/core/app/apps/agent_chat/test_agent_chat_app_runner.py:207:9
+   --> tests/unit_tests/core/app/apps/agent_chat/test_agent_chat_app_runner.py:198:9
 ERROR Cannot index into `str` [bad-index]
    --> tests/unit_tests/core/app/apps/agent_chat/test_agent_chat_generate_response_converter.py:158:16
 ERROR Cannot index into `str` [bad-index]
@@ -3269,61 +3269,53 @@
 ERROR Argument `dict[str, str]` is not assignable to parameter `override_config_dict` with type `AppModelConfigDict | None` in function `core.app.apps.chat.app_config_manager.ChatAppConfigManager.get_app_config` [bad-argument-type]
   --> tests/unit_tests/core/app/apps/chat/test_app_config_manager.py:37:38
 ERROR No matching overload found for function `core.app.apps.chat.app_generator.ChatAppGenerator.generate` called with arguments: (app_model=SimpleNamespace, user=SimpleNamespace, args=dict[str, dict[@_, @_]], invoke_from=Literal[InvokeFrom.SERVICE_API], streaming=Literal[False]) [no-matching-overload]
-  --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:50:31
+  --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:36:31
 ERROR No matching overload found for function `core.app.apps.chat.app_generator.ChatAppGenerator.generate` called with arguments: (app_model=SimpleNamespace, user=SimpleNamespace, args=dict[str, dict[@_, @_] | int], invoke_from=Literal[InvokeFrom.SERVICE_API], streaming=Literal[False]) [no-matching-overload]
-  --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:61:31
+  --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:47:31
 ERROR No matching overload found for function `core.app.apps.chat.app_generator.ChatAppGenerator.generate` called with arguments: (SimpleNamespace, SimpleNamespace, dict[str, dict[Unknown, Unknown] | str | dict[str, str]], Literal[InvokeFrom.DEBUGGER], streaming=Literal[False]) [no-matching-overload]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:108:40
+  --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:94:40
 ERROR No matching overload found for function `core.app.apps.chat.app_generator.ChatAppGenerator.generate` called with arguments: (app_model=SimpleNamespace, user=SimpleNamespace, args=dict[str, dict[@_, @_] | str | dict[str, str]], invoke_from=Literal[InvokeFrom.SERVICE_API], streaming=Literal[False]) [no-matching-overload]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:121:35
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:107:35
 ERROR Argument `DummyGenerateEntity` is not assignable to parameter `application_generate_entity` with type `ChatAppGenerateEntity` in function `core.app.apps.chat.app_generator.ChatAppGenerator._generate_worker` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:142:45
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:128:45
 ERROR Argument `DummyQueueManager` is not assignable to parameter `queue_manager` with type `AppQueueManager` in function `core.app.apps.chat.app_generator.ChatAppGenerator._generate_worker` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:143:31
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:129:31
 ERROR Argument `DummyGenerateEntity` is not assignable to parameter `application_generate_entity` with type `ChatAppGenerateEntity` in function `core.app.apps.chat.app_generator.ChatAppGenerator._generate_worker` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:158:45
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:144:45
 ERROR Argument `DummyQueueManager` is not assignable to parameter `queue_manager` with type `AppQueueManager` in function `core.app.apps.chat.app_generator.ChatAppGenerator._generate_worker` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:159:31
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:145:31
 ERROR Argument `DummyGenerateEntity` is not assignable to parameter `application_generate_entity` with type `ChatAppGenerateEntity` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:186:28
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:172:28
 ERROR Argument `DummyQueueManager` is not assignable to parameter `queue_manager` with type `AppQueueManager` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:186:49
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:172:49
 ERROR Argument `SimpleNamespace` is not assignable to parameter `conversation` with type `Conversation` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:186:70
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:172:70
 ERROR Argument `SimpleNamespace` is not assignable to parameter `message` with type `Message` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:186:89
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:172:89
 ERROR Argument `DummyGenerateEntity` is not assignable to parameter `application_generate_entity` with type `ChatAppGenerateEntity` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:217:24
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:206:24
 ERROR Argument `DummyQueueManager` is not assignable to parameter `queue_manager` with type `AppQueueManager` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:217:45
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:206:45
 ERROR Argument `SimpleNamespace` is not assignable to parameter `conversation` with type `Conversation` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:217:66
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:206:66
 ERROR Argument `SimpleNamespace` is not assignable to parameter `message` with type `Message` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:217:85
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:206:85
 ERROR Argument `DummyGenerateEntity` is not assignable to parameter `application_generate_entity` with type `ChatAppGenerateEntity` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:254:24
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:246:24
 ERROR Argument `DummyQueueManager` is not assignable to parameter `queue_manager` with type `AppQueueManager` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:254:45
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:246:45
 ERROR Argument `SimpleNamespace` is not assignable to parameter `conversation` with type `Conversation` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:254:60
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:246:60
 ERROR Argument `SimpleNamespace` is not assignable to parameter `message` with type `Message` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:254:79
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:246:79
 ERROR Argument `DummyGenerateEntity` is not assignable to parameter `application_generate_entity` with type `ChatAppGenerateEntity` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:289:24
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:284:24
 ERROR Argument `DummyQueueManager` is not assignable to parameter `queue_manager` with type `AppQueueManager` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:289:45
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:284:45
 ERROR Argument `SimpleNamespace` is not assignable to parameter `conversation` with type `Conversation` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:289:66
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:284:66
 ERROR Argument `SimpleNamespace` is not assignable to parameter `message` with type `Message` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:289:85
-ERROR Argument `DummyGenerateEntity` is not assignable to parameter `application_generate_entity` with type `ChatAppGenerateEntity` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:342:24
-ERROR Argument `DummyQueueManager` is not assignable to parameter `queue_manager` with type `AppQueueManager` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:342:45
-ERROR Argument `SimpleNamespace` is not assignable to parameter `conversation` with type `Conversation` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:342:60
-ERROR Argument `SimpleNamespace` is not assignable to parameter `message` with type `Message` in function `core.app.apps.chat.app_runner.ChatAppRunner.run` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:342:79
+   --> tests/unit_tests/core/app/apps/chat/test_app_generator_and_runner.py:284:85
 ERROR Cannot index into `str` [bad-index]
   --> tests/unit_tests/core/app/apps/chat/test_generate_response_converter.py:60:16
 ERROR Cannot index into `str` [bad-index]
@@ -3385,15 +3377,15 @@
 ERROR Class `PipelineRunner` has no class attribute `call_args` [missing-attribute]
    --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_generator.py:389:12
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
-  --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:92:37
+  --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:66:37
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:171:37
+   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:141:37
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:194:37
+   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:164:37
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:220:37
+   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:190:37
 ERROR Argument `SimpleNamespace` is not assignable to parameter `application_generate_entity` with type `RagPipelineGenerateEntity` in function `core.app.apps.pipeline.pipeline_runner.PipelineRunner.__init__` [bad-argument-type]
-   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:280:37
+   --> tests/unit_tests/core/app/apps/pipeline/test_pipeline_runner.py:249:37
 ERROR `Literal['generated-conversation-id']` is not assignable to attribute `id` with type `Never` [bad-assignment]
   --> tests/unit_tests/core/app/apps/test_advanced_chat_app_generator.py:55:22
 ERROR `Literal['generated-message-id']` is not assignable to attribute `id` with type `Never` [bad-assignment]
@@ -4514,16 +4506,6 @@
    --> tests/unit_tests/core/plugin/impl/test_trigger_client.py:179:21
 ERROR Argument `werkzeug.wrappers.request.Request` is not assignable to parameter `request` with type `flask.wrappers.Request` in function `core.plugin.impl.trigger.PluginTriggerClient.dispatch_event` [bad-argument-type]
    --> tests/unit_tests/core/plugin/impl/test_trigger_client.py:193:25
-ERROR Argument `SimpleNamespace` is not assignable to parameter `app` with type `App` in function `core.plugin.backwards_invocation.app.PluginAppBackwardsInvocation._get_user` [bad-argument-type]
-   --> tests/unit_tests/core/plugin/test_backwards_invocation_app.py:339:62
-ERROR Argument `SimpleNamespace` is not assignable to parameter `app` with type `App` in function `core.plugin.backwards_invocation.app.PluginAppBackwardsInvocation._get_user` [bad-argument-type]
-   --> tests/unit_tests/core/plugin/test_backwards_invocation_app.py:353:62
-ERROR Argument `SimpleNamespace` is not assignable to parameter `app` with type `App` in function `core.plugin.backwards_invocation.app.PluginAppBackwardsInvocation._get_user` [bad-argument-type]
-   --> tests/unit_tests/core/plugin/test_backwards_invocation_app.py:368:59
-ERROR Argument `SimpleNamespace` is not assignable to parameter `app` with type `App` in function `core.plugin.backwards_invocation.app.PluginAppBackwardsInvocation._get_workflow` [bad-argument-type]
-   --> tests/unit_tests/core/plugin/test_backwards_invocation_app.py:393:59
-ERROR Argument `SimpleNamespace` is not assignable to parameter `app` with type `App` in function `core.plugin.backwards_invocation.app.PluginAppBackwardsInvocation._get_app_model_config_dict` [bad-argument-type]
-   --> tests/unit_tests/core/plugin/test_backwards_invocation_app.py:417:74
 ERROR Argument `SimpleNamespace` is not assignable to parameter `tenant` with type `Tenant` in function `core.plugin.backwards_invocation.model.PluginModelBackwardsInvocation.invoke_summary` [bad-argument-type]
   --> tests/unit_tests/core/plugin/test_backwards_invocation_model.py:54:72
 ERROR Generator function should return `Generator` [bad-return]
@@ -6355,8 +6337,6 @@
    --> tests/unit_tests/models/test_account_models.py:595:23
 ERROR Argument `dict[str, bool | str]` is not assignable to parameter `value` with type `TenantCustomConfigDict` in function `models.account.Tenant.custom_config_dict` [bad-argument-type]
    --> tests/unit_tests/models/test_account_models.py:616:37
-ERROR Argument `dict[str, bool]` is not assignable to parameter `annotation_reply` with type `AnnotationReplyDisabledConfig | AnnotationReplyEnabledConfig | None` in function `models.model.AppModelConfig.to_dict` [bad-argument-type]
-   --> tests/unit_tests/models/test_app_models.py:357:54
 ERROR Class member `FooModel.id` overrides a member in a parent class but is missing an `@override` decorator [missing-override-decorator]
  --> tests/unit_tests/models/test_base.py:6:14
 ERROR `None` is not subscriptable [unsupported-operation]

@github-actions

Copy link
Copy Markdown
Contributor

Pyrefly Type Coverage

Metric Base PR Delta
Type coverage 50.86% 50.86% +0.00%
Strict coverage 50.37% 50.37% +0.00%
Typed symbols 30,061 30,051 -10
Untyped symbols 29,328 29,313 -15
Modules 2920 2920 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Title: Knowledge Storage remains at 50/50 MB even after deleting all datasets

1 participant