Skip to content

Commit

Permalink
* fix save_ckpt bug: error if the number of samples in the result dat…
Browse files Browse the repository at this point in the history
…aset is less than the number of workers when saving dataset to disk (#536)
  • Loading branch information
HYLcool authored Jan 14, 2025
1 parent 6a869c2 commit 0575193
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion data_juicer/utils/ckpt_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,9 @@ def save_ckpt(self, ds):
:param ds: input dataset to save
"""
ds.save_to_disk(self.ckpt_ds_dir, num_proc=self.num_proc)
left_sample_num = len(ds)
ds.save_to_disk(self.ckpt_ds_dir,
num_proc=min(self.num_proc, left_sample_num))

with open(self.ckpt_op_record, 'w') as fout:
json.dump(self.op_record, fout)
Expand Down

0 comments on commit 0575193

Please sign in to comment.