Skip to content

feat: add entity deduplication after graph building#210

Closed
ChinmayShringi wants to merge 1 commit intomainfrom
feature/entity-deduplication
Closed

feat: add entity deduplication after graph building#210
ChinmayShringi wants to merge 1 commit intomainfrom
feature/entity-deduplication

Conversation

@ChinmayShringi
Copy link
Copy Markdown
Owner

Hi @666ghj

I noticed that during graph building, Zep sometimes creates duplicate
entity nodes for the same real-world entity (e.g. "特朗普" and "美国总统特朗普"
appear as separate nodes). This affects the accuracy of the knowledge graph.

This PR adds an automatic entity deduplication step after graph building,
using name similarity pre-filtering + type compatibility check + LLM
confirmation to identify and merge duplicates.

Would appreciate it if you could take a look when you have time.
Happy to make any changes based on your feedback. Thanks!

Summary

  • Add entity deduplication service that identifies and merges duplicate
    nodes in the knowledge graph after building (e.g. "特朗普" vs "美国总统特朗普")
  • When merging duplicate nodes, migrates all edges from removed nodes
    to the primary node before deletion, preserving graph connectivity
  • Three-layer filtering: name similarity pre-filter → type compatibility
    check → LLM confirmation
  • Integrate into graph build pipeline automatically (80%-90% progress stage)
  • Add standalone POST /api/graph/deduplicate endpoint for manual dedup
  • Display dedup report in frontend showing which entities were merged
duplicate remove

Changes

  • New: backend/app/services/entity_deduplicator.py
  • Modified: backend/app/api/graph.py
  • Modified: backend/app/services/__init__.py
  • Modified: frontend/src/views/Process.vue
  • Modified: frontend/src/views/MainView.vue
  • Modified: frontend/src/components/Step1GraphBuild.vue

Closes #145


Original PR: 666ghj/MiroFish#141
Original Author: @Stayfoool

@ChinmayShringi ChinmayShringi added the enhancement New feature or request label Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Request failed with status code 504

1 participant