You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current script building clusters of duplicates, but there are cases it might yield unwanted results:
When doc B is clustered under doc A's name, another doc C can also be clustered into B's name (AB, BC, C!~A), thus when we are deleting non "extreme"s from each cluster, we could end up having both A and B kept in the results.
A better way to delete duplicates is to find community within each connected components. This is used in https://github.com/src-d/gemini.
The text was updated successfully, but these errors were encountered:
The current script building clusters of duplicates, but there are cases it might yield unwanted results:
When doc B is clustered under doc A's name, another doc C can also be clustered into B's name (A
B, BC, C!~A), thus when we are deleting non "extreme"s from each cluster, we could end up having both A and B kept in the results.A better way to delete duplicates is to find community within each connected components. This is used in https://github.com/src-d/gemini.
The text was updated successfully, but these errors were encountered: