You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for sharing the s2s repo
I am facing two issues:
1: How i can use my own .JSONL data format instead of using VoiceAssistant-400K in .Parquet format
i have prepared data in .jsnol format as given in the demo
{"key": 1, "source_wav": "/DATA/Adarsh/DAS_FR/covost_split/test/common_voice_fr_17299399.mp3.wav", "source_text": "Un vrai travail intéressant va, enfin, être mené sur ce sujet.", "target_wav": "/DATA/Adarsh/DAS_FR/datafolder/cvss_c/fr-en/test/common_voice_fr_17299399.mp3.wav", "target_text": "really interesting work will finally be undertaken on that topic"}
{"key": 2, "source_wav": "/DATA/Adarsh/DAS_FR/covost_split/test/common_voice_fr_17299400.mp3.wav", "source_text": "Une réforme profonde est nécessaire.", "target_wav": "/DATA/Adarsh/DAS_FR/datafolder/cvss_c/fr-en/test/common_voice_fr_17299400.mp3.wav", "target_text": "a profound reform is necessary"}
{"key": 3, "source_wav": "/DATA/Adarsh/DAS_FR/covost_split/test/common_voice_fr_17299401.mp3.wav", "source_text": "Pas si nombreuses que ça", "target_wav": "/DATA/Adarsh/DAS_FR/datafolder/cvss_c/fr-en/test/common_voice_fr_17299401.mp3.wav", "target_text": "not that many"}
{"key": 4, "source_wav": "/DATA/Adarsh/DAS_FR/covost_split/test/common_voice_fr_17300796.mp3.wav", "source_text": "Un comité interministériel du handicap s’est tenu il y a quelques semaines.", "target_wav": "/DATA/Adarsh/DAS_FR/datafolder/cvss_c/fr-en/test/common_voice_fr_17300796.mp3.wav", "target_text": "an inter ministerial committee on disability was held a few weeks back"}
but this is troughing me an error
File "/DATA/anaconda3/envs/Lalaram_SLAM-Omni/lib/python3.10/site-packages/datasets/builder.py", line 1005, in download_and_prepare
self._download_and_prepare(
File "/DATA/anaconda3/envs/Lalaram_SLAM-Omni/lib/python3.10/site-packages/datasets/builder.py", line 1100, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/DATA/anaconda3/envs/Lalaram_SLAM-Omni/lib/python3.10/site-packages/datasets/builder.py", line 1860, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/DATA/anaconda3/envs/Lalaram_SLAM-Omni/lib/python3.10/site-packages/datasets/builder.py", line 1991, in _prepare_split_single
raise DatasetGenerationCastError.from_cast_error(
datasets.exceptions.DatasetGenerationCastError: An error occurred while generating the dataset
All the data files must have the same columns, but at some point there are 5 new columns (key, target_wav, target_text, source_wav, source_text) and 8 missing columns (round, index, question, answer_cosyvoice_speech_token, split_name, question_audio, answer_snac, answer).
This happened while the JSONL dataset builder was generating data using
So it is in the jsnol format also, i have maintained these 8 columns (round, index, question, answer_cosyvoice_speech_token, split_name, question_audio, answer_snac, answer)
or these column key, target_wav, target_text, source_wav, source_text as given in the demo jsonl file
2: I have set the resume condition, but it is still starting from the 0th step instead of resuming the step where it stopped.
examples/s2s/model/slam_model_s2s.py:96: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
ckpt_dict = torch.load(ckpt_path, map_location="cpu")
[2025-02-04 16:16:54][slam_llm.utils.train_utils][INFO] - --> Module Qwen2-0.5b
[2025-02-04 16:16:54][slam_llm.utils.train_utils][INFO] - --> Qwen2-0.5b has 494.032768 Million params
[2025-02-04 16:16:54][slam_llm.utils.train_utils][INFO] - --> Module Qwen2-0.5b
[2025-02-04 16:16:54][slam_llm.utils.train_utils][INFO] - --> Qwen2-0.5b has 494.032768 Million params
[2025-02-04 16:16:54][slam_llm.utils.train_utils][INFO] - --> Module linear
[2025-02-04 16:16:54][slam_llm.utils.train_utils][INFO] - --> linear has 9.702272 Million params
[2025-02-04 16:16:56][slam_model_s2s.py][INFO] - loading other parts from: /DATA/Lalaram/SLAM_omni/SLAM-LLM/s2s_train_v4-Qwen2-0.5b-gpu2-btz1-lr1e-4-fp16-epochs10-whisper_small-latency0-group1/s2s_epoch_1_step_99000/model.pt
examples/s2s/model/slam_model_s2s.py:96: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
ckpt_dict = torch.load(ckpt_path, map_location="cpu")
[2025-02-04 16:16:57][slam_llm.utils.train_utils][INFO] - --> Model s2s
[2025-02-04 16:16:57][slam_llm.utils.train_utils][INFO] - --> s2s has 507.519744 Million params
We recently fixed support for the .JSONL file format. You can find the updated JSONL demo here: jsonl_demo-en.jsonl. Please note that users need to manually generate the corresponding audio tokens for the response audio. Also, when using .JSONL format data, you must set manifest_format=parquet in the script.
Currently, SLAM-LLM does not support a resume mechanism. We apologize for any inconvenience this may cause.
Thank you for sharing the s2s repo
I am facing two issues:
1: How i can use my own .JSONL data format instead of using VoiceAssistant-400K in .Parquet format
i have prepared data in .jsnol format as given in the demo
{"key": 1, "source_wav": "/DATA/Adarsh/DAS_FR/covost_split/test/common_voice_fr_17299399.mp3.wav", "source_text": "Un vrai travail intéressant va, enfin, être mené sur ce sujet.", "target_wav": "/DATA/Adarsh/DAS_FR/datafolder/cvss_c/fr-en/test/common_voice_fr_17299399.mp3.wav", "target_text": "really interesting work will finally be undertaken on that topic"}
{"key": 2, "source_wav": "/DATA/Adarsh/DAS_FR/covost_split/test/common_voice_fr_17299400.mp3.wav", "source_text": "Une réforme profonde est nécessaire.", "target_wav": "/DATA/Adarsh/DAS_FR/datafolder/cvss_c/fr-en/test/common_voice_fr_17299400.mp3.wav", "target_text": "a profound reform is necessary"}
{"key": 3, "source_wav": "/DATA/Adarsh/DAS_FR/covost_split/test/common_voice_fr_17299401.mp3.wav", "source_text": "Pas si nombreuses que ça", "target_wav": "/DATA/Adarsh/DAS_FR/datafolder/cvss_c/fr-en/test/common_voice_fr_17299401.mp3.wav", "target_text": "not that many"}
{"key": 4, "source_wav": "/DATA/Adarsh/DAS_FR/covost_split/test/common_voice_fr_17300796.mp3.wav", "source_text": "Un comité interministériel du handicap s’est tenu il y a quelques semaines.", "target_wav": "/DATA/Adarsh/DAS_FR/datafolder/cvss_c/fr-en/test/common_voice_fr_17300796.mp3.wav", "target_text": "an inter ministerial committee on disability was held a few weeks back"}
but this is troughing me an error
File "/DATA/anaconda3/envs/Lalaram_SLAM-Omni/lib/python3.10/site-packages/datasets/builder.py", line 1005, in download_and_prepare
self._download_and_prepare(
File "/DATA/anaconda3/envs/Lalaram_SLAM-Omni/lib/python3.10/site-packages/datasets/builder.py", line 1100, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/DATA/anaconda3/envs/Lalaram_SLAM-Omni/lib/python3.10/site-packages/datasets/builder.py", line 1860, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/DATA/anaconda3/envs/Lalaram_SLAM-Omni/lib/python3.10/site-packages/datasets/builder.py", line 1991, in _prepare_split_single
raise DatasetGenerationCastError.from_cast_error(
datasets.exceptions.DatasetGenerationCastError: An error occurred while generating the dataset
All the data files must have the same columns, but at some point there are 5 new columns (key, target_wav, target_text, source_wav, source_text) and 8 missing columns (round, index, question, answer_cosyvoice_speech_token, split_name, question_audio, answer_snac, answer).
This happened while the JSONL dataset builder was generating data using
/DATA/Lalaram/SLAM_omni/SLAM-LLM/Dataset/VoiceAssistant-400K-SLAM-Omni/data/train-00000-of-00100.jsonl
So it is in the jsnol format also, i have maintained these 8 columns (round, index, question, answer_cosyvoice_speech_token, split_name, question_audio, answer_snac, answer)
or these column key, target_wav, target_text, source_wav, source_text as given in the demo jsonl file
2: I have set the resume condition, but it is still starting from the 0th step instead of resuming the step where it stopped.
examples/s2s/model/slam_model_s2s.py:96: FutureWarning: You are using
torch.load
withweights_only=False
(the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value forweights_only
will be flipped toTrue
. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user viatorch.serialization.add_safe_globals
. We recommend you start settingweights_only=True
for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.ckpt_dict = torch.load(ckpt_path, map_location="cpu")
[2025-02-04 16:16:54][slam_llm.utils.train_utils][INFO] - --> Module Qwen2-0.5b
[2025-02-04 16:16:54][slam_llm.utils.train_utils][INFO] - --> Qwen2-0.5b has 494.032768 Million params
[2025-02-04 16:16:54][slam_llm.utils.train_utils][INFO] - --> Module Qwen2-0.5b
[2025-02-04 16:16:54][slam_llm.utils.train_utils][INFO] - --> Qwen2-0.5b has 494.032768 Million params
[2025-02-04 16:16:54][slam_llm.utils.train_utils][INFO] - --> Module linear
[2025-02-04 16:16:54][slam_llm.utils.train_utils][INFO] - --> linear has 9.702272 Million params
[2025-02-04 16:16:56][slam_model_s2s.py][INFO] - Resize llm embedding layer's vocab size to 156160
[2025-02-04 16:16:56][slam_model_s2s.py][INFO] - loading other parts from: /DATA/Lalaram/SLAM_omni/SLAM-LLM/s2s_train_v4-Qwen2-0.5b-gpu2-btz1-lr1e-4-fp16-epochs10-whisper_small-latency0-group1/s2s_epoch_1_step_99000/model.pt
examples/s2s/model/slam_model_s2s.py:96: FutureWarning: You are using
torch.load
withweights_only=False
(the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value forweights_only
will be flipped toTrue
. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user viatorch.serialization.add_safe_globals
. We recommend you start settingweights_only=True
for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.ckpt_dict = torch.load(ckpt_path, map_location="cpu")
[2025-02-04 16:16:57][slam_llm.utils.train_utils][INFO] - --> Model s2s
[2025-02-04 16:16:57][slam_llm.utils.train_utils][INFO] - --> s2s has 507.519744 Million params
[rank1]:[W204 16:16:59.253905638 Utils.hpp:110] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
[rank0]:[W204 16:16:59.254548938 Utils.hpp:110] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
[2025-02-04 16:16:59][root][INFO] - dataset_config: {'dataset': 'speech_dataset_s2s', 'file': 'examples/s2s/speech_dataset_s2s.py:get_speech_dataset', 'train_data_path': '/DATA/Lalaram/SLAM_omni/SLAM-LLM/Dataset/VoiceAssistant-400K-SLAM-Omni', 'val_data_path': '/DATA/Lalaram/SLAM_omni/SLAM-LLM/Dataset/VoiceAssistant-400K-SLAM-Omni', 'train_split': 'train', 'test_split': 'validation', 'prompt': 'Conduct a spoken conversation with the user. ', 'data_path': None, 'max_words': None, 'max_mel': None, 'fix_length_audio': -1, 'inference_mode': False, 'input_type': 'mel', 'mel_size': 80, 'normalize': False, 'seed': 42, 'manifest_format': 'datasets', 'split_size': 0.01, 'vocab_config': {'text_vocabsize': 151936, 'text_specialtokens': 64, 'audio_vocabsize': 4096, 'audio_specialtokens': 64, 'code_layer': 1, 'padded_text_vocabsize': 152000, 'padded_audio_vocabsize': 4160, 'total_audio_vocabsize': 29120, 'total_vocabsize': 156160, 'eot': 151936, 'pad_t': 151937, 'input_t': 151938, 'answer_t': 151939, 'asr': 151940, 'eoa': 4096, 'pad_a': 4097, 'input_a': 4098, 'answer_a': 4099, 'split': 4100}, 'load_from_cache_file': True, 'task_type': 's2s', 'upsample_text_tokens': False, 'upsampling_factor': 1, 'upsample_method': 'repeat', 'code_type': 'CosyVoice', 'num_latency_tokens': 0, 'do_layershift': False}
[2025-02-04 16:16:59][root][INFO] - dataset_config: {'dataset': 'speech_dataset_s2s', 'file': 'examples/s2s/speech_dataset_s2s.py:get_speech_dataset', 'train_data_path': '/DATA/Lalaram/SLAM_omni/SLAM-LLM/Dataset/VoiceAssistant-400K-SLAM-Omni', 'val_data_path': '/DATA/Lalaram/SLAM_omni/SLAM-LLM/Dataset/VoiceAssistant-400K-SLAM-Omni', 'train_split': 'train', 'test_split': 'validation', 'prompt': 'Conduct a spoken conversation with the user. ', 'data_path': None, 'max_words': None, 'max_mel': None, 'fix_length_audio': -1, 'inference_mode': False, 'input_type': 'mel', 'mel_size': 80, 'normalize': False, 'seed': 42, 'manifest_format': 'datasets', 'split_size': 0.01, 'vocab_config': {'text_vocabsize': 151936, 'text_specialtokens': 64, 'audio_vocabsize': 4096, 'audio_specialtokens': 64, 'code_layer': 1, 'padded_text_vocabsize': 152000, 'padded_audio_vocabsize': 4160, 'total_audio_vocabsize': 29120, 'total_vocabsize': 156160, 'eot': 151936, 'pad_t': 151937, 'input_t': 151938, 'answer_t': 151939, 'asr': 151940, 'eoa': 4096, 'pad_a': 4097, 'input_a': 4098, 'answer_a': 4099, 'split': 4100}, 'load_from_cache_file': True, 'task_type': 's2s', 'upsample_text_tokens': False, 'upsampling_factor': 1, 'upsample_method': 'repeat', 'code_type': 'CosyVoice', 'num_latency_tokens': 0, 'do_layershift': False}
[2025-02-04 16:16:59][datasets][INFO] - PyTorch version 2.4.0+cu124 available.
[2025-02-04 16:16:59][datasets][INFO] - PyTorch version 2.4.0+cu124 available.
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 853/853 [00:00<00:00, 456402.77it/s]
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 853/853 [00:00<00:00, 435566.27it/s]
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 853/853 [00:00<00:00, 440988.70it/s]
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 853/853 [00:00<00:00, 418792.15it/s]
Loading dataset shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 508/508 [00:00<00:00, 12469.17it/s]
Loading dataset shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 508/508 [00:00<00:00, 12037.48it/s]
[2025-02-04 16:16:59][root][INFO] - --> Training Set Length = 458440
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 853/853 [00:00<00:00, 441206.23it/s]
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 853/853 [00:00<00:00, 437216.34it/s]
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 853/853 [00:00<00:00, 447161.77it/s]
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 853/853 [00:00<00:00, 432930.94it/s]
Loading dataset shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 508/508 [00:00<00:00, 11781.62it/s]
Loading dataset shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 508/508 [00:00<00:00, 12704.56it/s]
[2025-02-04 16:17:00][slam_llm.utils.config_utils][INFO] - Using batching strategy: custom
[2025-02-04 16:17:00][slam_llm.utils.config_utils][INFO] - Using batching strategy: custom
[2025-02-04 16:17:00][root][INFO] - --> Validation Set Length = 4631
[2025-02-04 16:17:00][slam_llm.utils.config_utils][INFO] - Using batching strategy: custom
[2025-02-04 16:17:00][slam_llm.utils.config_utils][INFO] - Using batching strategy: custom
/DATA/Lalaram/SLAM_omni/SLAM-LLM/src/slam_llm/utils/train_utils.py:71: FutureWarning:
torch.cuda.amp.GradScaler(args...)
is deprecated. Please usetorch.amp.GradScaler('cuda', args...)
instead.scaler = torch.cuda.amp.GradScaler()
/DATA/Lalaram/SLAM_omni/SLAM-LLM/src/slam_llm/utils/train_utils.py:71: FutureWarning:
torch.cuda.amp.GradScaler(args...)
is deprecated. Please usetorch.amp.GradScaler('cuda', args...)
instead.scaler = torch.cuda.amp.GradScaler()
/DATA/anaconda3/envs/Lalaram_SLAM-Omni/lib/python3.10/site-packages/torch/cuda/memory.py:343: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
warnings.warn(
/DATA/anaconda3/envs/Lalaram_SLAM-Omni/lib/python3.10/site-packages/torch/cuda/memory.py:343: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
warnings.warn(
Training Epoch: 1: 0%| | 0/229220 [00:00<?, ?it/s]/DATA/Lalaram/SLAM_omni/SLAM-LLM/src/slam_llm/utils/train_utils.py:109: FutureWarning:
torch.cuda.amp.autocast(args...)
is deprecated. Please usetorch.amp.autocast('cuda', args...)
instead.with autocast():
/DATA/Lalaram/SLAM_omni/SLAM-LLM/src/slam_llm/utils/train_utils.py:109: FutureWarning:
torch.cuda.amp.autocast(args...)
is deprecated. Please usetorch.amp.autocast('cuda', args...)
instead.with autocast():
Training Epoch: 1/10, step 2999/229220 completed (loss: 1.524409532546997, acc: 0.6829268336296082): 1%|▊ | 3000/229220 [23:09<29:43:45, 2.11it/s]/DATA/anaconda3/envs/Lalaram_SLAM-Omni/lib/python3.10/site-packages/torch/cuda/memory.py:343: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
warnings.warn(
The text was updated successfully, but these errors were encountered: