Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about chunk_num_per_shard #1

Open
snowkcon opened this issue Nov 2, 2024 · 7 comments
Open

about chunk_num_per_shard #1

snowkcon opened this issue Nov 2, 2024 · 7 comments

Comments

@snowkcon
Copy link

snowkcon commented Nov 2, 2024

if len(self._chunks) % self.chunk_num_per_shard == 0:

AttributeError: 'ChunkedDatasetBuilder' object has no attribute 'chunk_num_per_shard'

t1101675 added a commit that referenced this issue Nov 7, 2024
@t1101675
Copy link
Member

t1101675 commented Nov 7, 2024

Fixed.

@snowkcon
Copy link
Author

about grouped_infer

scripts/miniplm/difference_sampling/1.8B.sh

[rank0]: AttributeError: 'Namespace' object has no attribute 'grouped_infer'

@t1101675
Copy link
Member

It seems the arguement is removed by mistake during code cleaning.
The bug is fixed. Thanks for pointing it out!

@snowkcon
Copy link
Author

  1. construct_pretrain_data.py No method call
  2. readme about construct_pretrain_data.py file no prompt to enter ratio

@t1101675
Copy link
Member

Fixed.

@snowkcon
Copy link
Author

snowkcon commented Nov 13, 2024

about readme

Vanilla KD
bash scripts/vanilla_kd/qwen/200M.sh /PATH/TO/MiniPLM
bash scripts/vanilla_kd/qwen/500M.sh /PATH/TO/MiniPLM
bash scripts/vanilla_kd/qwen/1.2B.sh /PATH/TO/MiniPLM
SeqKD
bash scripts/seqkd/qwen/200M.sh /PATH/TO/MiniPLM
bash scripts/seqkd/qwen/500M.sh /PATH/TO/MiniPLM
bash scripts/seqkd/qwen/1.2B.sh /PATH/TO/MiniPLM

@t1101675
Copy link
Member

Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants