We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Bitextor generates translation memories from multilingual websites
Python 291 43
Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
Python 155 22
Tool to fix bitexts and tag near-duplicates for removal
Python 30 3
Utility that will help you to ROAM (Random Omit Anonymize and Mix) your parallel corpus.
Python 10 2
PDF parser and converter to HTML
Java 85 14
Extracts plain text, language identification and more metadata from WARC records
C++ 21 5
Pre-filtering step for bicleaner
Compact Language Detector 2
Monocleaner models repository
Playwright-based web crawler