Skip to content

v0.4.0

Compare
Choose a tag to compare
@ryantwolf ryantwolf released this 14 Aug 21:54
07bc29d

Highlights

  • Semantic Deduplication
  • Resiliparse for Text Extraction
  • Improve Distributed Data Classification - Domain classifier is 1.55x faster through intelligent batching
  • Synthetic data generation for fine-tuning

What's Changed

New Contributors

Full Changelog: https://github.com/NVIDIA/NeMo-Curator/commits/v0.4.0