You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
InfinyOn Cloud and Fluvio users need to remove duplicates from the collected records.
Deduplication of records can happen at the record level before the topic at the level of the producer, or after the topic as smart modules.
The deduplication module has the following constraints:
Retention policy - time
Volume of records - record count or size of records
High level ideas
The deduplication process will utilise an index based on designated keys in the records within the data to identify duplicate records.
The module will build the index based on historical data in the topic
Initial implementation scope is suitable for relatively smaller datasets with incremental identifiers/keys like timestamps, which will identify the duplicates
Based on our lessons from this implementation and user feedback, we will identify the implementation at the stream processing unit level
High level diagram of the flow:
To Update:
basic technical design elements describing the solution.
The text was updated successfully, but these errors were encountered:
drc-infinyon
changed the title
deduplication module to remove duplicates from topics using incremental identifiers/keys like timestamp
[Feature] Deduplication functionality to remove duplicates from topics using incremental identifiers/keys like timestamp
Jul 5, 2023
InfinyOn Cloud and Fluvio users need to remove duplicates from the collected records.
Deduplication of records can happen at the record level before the topic at the level of the producer, or after the topic as smart modules.
The deduplication module has the following constraints:
High level ideas
High level diagram of the flow:
To Update:
basic technical design elements describing the solution.
The text was updated successfully, but these errors were encountered: