Skip to content

Commit

Permalink
Merge branch 'ray_dedup_with_actor' of github.com:chenyushuo/data-jui…
Browse files Browse the repository at this point in the history
…cer into ray_dedup_with_actor
  • Loading branch information
chenyushuo committed Jan 20, 2025
2 parents 24a8640 + ecc4c0a commit faec9f2
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions docs/Operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,10 +78,10 @@ All the specific operators are listed below, each featured with several capabili
| document_simhash_deduplicator | 🔤Text 💻CPU 🟡Beta | Deduplicator to deduplicate samples at document-level using SimHash. Deduplicator 使用 SimHash 在文档级别删除重复样本。 | [code](../data_juicer/ops/deduplicator/document_simhash_deduplicator.py) | [tests](../tests/ops/deduplicator/test_document_simhash_deduplicator.py) |
| image_deduplicator | 🏞Image 💻CPU 🟡Beta | Deduplicator to deduplicate samples at document-level using exact matching of images between documents. 重复数据删除器使用文档之间图像的精确匹配来在文档级别删除重复样本。 | [code](../data_juicer/ops/deduplicator/image_deduplicator.py) | [tests](../tests/ops/deduplicator/test_image_deduplicator.py) |
| ray_basic_deduplicator | 💻CPU 🔴Alpha | Backend for deduplicator. 重复数据删除器的后端。 | [code](../data_juicer/ops/deduplicator/ray_basic_deduplicator.py) | - |
| ray_bts_minhash_deduplicator | 🔤Text 💻CPU 🔴Alpha | A distributed implementation of Union-Find with load balancing. 具有负载平衡功能的 Union-Find 的分布式实现。 | [code](../data_juicer/ops/deduplicator/ray_bts_minhash_deduplicator.py) | - |
| ray_document_deduplicator | 🔤Text 💻CPU 🔴Alpha | Deduplicator to deduplicate samples at document-level using exact matching. 重复数据删除器使用精确匹配在文档级别删除重复样本。 | [code](../data_juicer/ops/deduplicator/ray_document_deduplicator.py) | - |
| ray_image_deduplicator | 🏞Image 💻CPU 🔴Alpha | Deduplicator to deduplicate samples at document-level using exact matching of images between documents. 重复数据删除器使用文档之间图像的精确匹配来在文档级别删除重复样本。 | [code](../data_juicer/ops/deduplicator/ray_image_deduplicator.py) | - |
| ray_video_deduplicator | 🎬Video 💻CPU 🔴Alpha | Deduplicator to deduplicate samples at document-level using exact matching of videos between documents. 重复数据删除器使用文档之间视频的精确匹配来在文档级别删除重复样本。 | [code](../data_juicer/ops/deduplicator/ray_video_deduplicator.py) | - |
| ray_bts_minhash_deduplicator | 🔤Text 💻CPU 🟡Beta | A distributed implementation of Union-Find with load balancing. 具有负载平衡功能的 Union-Find 的分布式实现。 | [code](../data_juicer/ops/deduplicator/ray_bts_minhash_deduplicator.py) | [tests](../tests/ops/deduplicator/test_ray_bts_minhash_deduplicator.py) |
| ray_document_deduplicator | 🔤Text 💻CPU 🟡Beta | Deduplicator to deduplicate samples at document-level using exact matching. 重复数据删除器使用精确匹配在文档级别删除重复样本。 | [code](../data_juicer/ops/deduplicator/ray_document_deduplicator.py) | [tests](../tests/ops/deduplicator/test_ray_document_deduplicator.py) |
| ray_image_deduplicator | 🏞Image 💻CPU 🟡Beta | Deduplicator to deduplicate samples at document-level using exact matching of images between documents. 重复数据删除器使用文档之间图像的精确匹配来在文档级别删除重复样本。 | [code](../data_juicer/ops/deduplicator/ray_image_deduplicator.py) | [tests](../tests/ops/deduplicator/test_ray_image_deduplicator.py) |
| ray_video_deduplicator | 🎬Video 💻CPU 🟡Beta | Deduplicator to deduplicate samples at document-level using exact matching of videos between documents. 重复数据删除器使用文档之间视频的精确匹配来在文档级别删除重复样本。 | [code](../data_juicer/ops/deduplicator/ray_video_deduplicator.py) | [tests](../tests/ops/deduplicator/test_ray_video_deduplicator.py) |
| video_deduplicator | 🎬Video 💻CPU 🟡Beta | Deduplicator to deduplicate samples at document-level using exact matching of videos between documents. 重复数据删除器使用文档之间视频的精确匹配来在文档级别删除重复样本。 | [code](../data_juicer/ops/deduplicator/video_deduplicator.py) | [tests](../tests/ops/deduplicator/test_video_deduplicator.py) |

## filter <a name="filter"/>
Expand Down

0 comments on commit faec9f2

Please sign in to comment.