Skip to content

Pull requests: NVIDIA/NeMo-Curator

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Edit flaky PyTest in test_download
#454 opened Dec 23, 2024 by sarahyurick Loading…
ci: Update release.yml
#452 opened Dec 21, 2024 by ko3n1g Loading…
3 tasks
Change filename column name to file_name gpuci Run GPU CI/CD on PR
#449 opened Dec 20, 2024 by praateekmahajan Loading…
3 tasks
Support the new minhash 25.02 api gpuci Run GPU CI/CD on PR
#445 opened Dec 20, 2024 by praateekmahajan Loading…
3 tasks
[WIP] Add RAPIDS Nightly to GPU CI gpuci Run GPU CI/CD on PR
#436 opened Dec 17, 2024 by praateekmahajan Draft
3 tasks
Updating the Quick Example
#432 opened Dec 16, 2024 by stsfaroz Loading…
Add TrafilaturaExtractor class
#431 opened Dec 13, 2024 by sarahyurick Loading…
Bump nltk from 3.8.1 to 3.9 in /tutorials/dapt-curation/code dependencies Pull requests that update a dependency file
#429 opened Dec 13, 2024 by dependabot bot Loading…
Create notebook tutorials for distributed data classifiers documentation Improvements or additions to documentation
#415 opened Dec 6, 2024 by sarahyurick Loading…
3 tasks done
Added LookUp error handling during encoding detection.
#412 opened Dec 6, 2024 by ggcr Loading…
Create separate files for each deduplication class gpuci Run GPU CI/CD on PR
#409 opened Dec 3, 2024 by sarahyurick Loading…
Version bump to 0.6.0rc1.dev0
#396 opened Nov 27, 2024 by github-actions bot Loading…
Fix GPU error messages for fuzzy deduplication gpuci Run GPU CI/CD on PR
#387 opened Nov 22, 2024 by sarahyurick Loading…
2 tasks done
Fuzzy Dedup: Make skipping the False positive check the default enhancement New feature or request gpuci Run GPU CI/CD on PR
#386 opened Nov 21, 2024 by ayushdg Loading…
2 of 3 tasks
Remove max_text_bytes_per_part gpuci Run GPU CI/CD on PR
#385 opened Nov 20, 2024 by sarahyurick Loading…
Global cache_dir variable for exact, fuzzy, and semantic deduplication gpuci Run GPU CI/CD on PR
#384 opened Nov 19, 2024 by sarahyurick Loading…
3 tasks done
ci: Add copyright-check workflow
#369 opened Nov 14, 2024 by ko3n1g Loading…
3 tasks
Added example notebook for translation with ct2 model. documentation Improvements or additions to documentation
#262 opened Sep 25, 2024 by uahmed93 Draft
3 tasks
Fixed bug: changed to correct model name
#186 opened Aug 6, 2024 by ByteWrite Loading…
1 of 3 tasks
ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.