You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, your code work great and succussed to generate image from audio the problem is when I try the flag --mixed_precision fp16 I get always black image. Are you try run your code with this flag I not succussed to understand why it's happen?
`05/04/2024 15:33:47 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16
/home/student/anaconda3/envs/TempToken/lib/python3.8/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
/home/student/anaconda3/envs/TempToken/lib/python3.8/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
{'dropout', 'time_embedding_act_fn', 'resnet_out_scale_factor', 'cross_attention_norm', 'time_cond_proj_dim', 'resnet_skip_time_act', 'addition_embed_type_num_heads', 'conv_out_kernel', 'timestep_post_act', 'class_embeddings_concat', 'only_cross_attention', 'addition_time_embed_dim', 'upcast_attention', 'addition_embed_type', 'num_class_embeds', 'time_embedding_dim', 'encoder_hid_dim', 'encoder_hid_dim_type', 'num_attention_heads', 'time_embedding_type', 'attention_type', 'class_embed_type', 'mid_block_only_cross_attention', 'conv_in_kernel', 'reverse_transformer_layers_per_block', 'dual_cross_attention', 'transformer_layers_per_block', 'mid_block_type', 'resnet_time_scale_shift', 'use_linear_projection', 'projection_class_embeddings_input_dim'} was not found in config. Values will be initialized to default values.
{'norm_num_groups', 'latents_mean', 'force_upcast', 'latents_std'} was not found in config. Values will be initialized to default values.
{'dynamic_thresholding_ratio', 'timestep_spacing', 'clip_sample_range', 'sample_max_value', 'prediction_type', 'rescale_betas_zero_snr', 'thresholding'} was not found in config. Values will be initialized to default values.
05/04/2024 15:33:50 - INFO - modules.BEATs.BEATs - BEATs Config: {'input_patch_size': 16, 'embed_dim': 512, 'conv_bias': False, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': 'gelu', 'layer_wise_gradient_decay_ratio': 0.6, 'layer_norm_first': False, 'deep_norm': True, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.0, 'conv_pos': 128, 'conv_pos_groups': 16, 'relative_position_embedding': True, 'num_buckets': 320, 'max_distance': 800, 'gru_rel_pos': True, 'finetuned_model': True, 'predictor_dropout': 0.0, 'predictor_class': 527}
/home/student/anaconda3/envs/TempToken/lib/python3.8/site-packages/torchaudio/compliance/kaldi.py:616: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at /opt/conda/conda-bld/pytorch_1682343998658/work/aten/src/ATen/EmptyTensor.cpp:31.)
spectrum = torch.fft.rfft(strided_input).abs()
scailing factor = 0.18215
/home/student/AudioToken/check_audioTOken.py:239: RuntimeWarning: invalid value encountered in cast
images = (image * 255).round().astype("uint8")`
The text was updated successfully, but these errors were encountered:
Hi, your code work great and succussed to generate image from audio the problem is when I try the flag
--mixed_precision fp16
I get always black image. Are you try run your code with this flag I not succussed to understand why it's happen?`05/04/2024 15:33:47 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16
/home/student/anaconda3/envs/TempToken/lib/python3.8/site-packages/huggingface_hub/file_download.py:1132: FutureWarning:
resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, useforce_download=True
.warnings.warn(
/home/student/anaconda3/envs/TempToken/lib/python3.8/site-packages/huggingface_hub/file_download.py:1132: FutureWarning:
resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, useforce_download=True
.warnings.warn(
{'dropout', 'time_embedding_act_fn', 'resnet_out_scale_factor', 'cross_attention_norm', 'time_cond_proj_dim', 'resnet_skip_time_act', 'addition_embed_type_num_heads', 'conv_out_kernel', 'timestep_post_act', 'class_embeddings_concat', 'only_cross_attention', 'addition_time_embed_dim', 'upcast_attention', 'addition_embed_type', 'num_class_embeds', 'time_embedding_dim', 'encoder_hid_dim', 'encoder_hid_dim_type', 'num_attention_heads', 'time_embedding_type', 'attention_type', 'class_embed_type', 'mid_block_only_cross_attention', 'conv_in_kernel', 'reverse_transformer_layers_per_block', 'dual_cross_attention', 'transformer_layers_per_block', 'mid_block_type', 'resnet_time_scale_shift', 'use_linear_projection', 'projection_class_embeddings_input_dim'} was not found in config. Values will be initialized to default values.
{'norm_num_groups', 'latents_mean', 'force_upcast', 'latents_std'} was not found in config. Values will be initialized to default values.
{'dynamic_thresholding_ratio', 'timestep_spacing', 'clip_sample_range', 'sample_max_value', 'prediction_type', 'rescale_betas_zero_snr', 'thresholding'} was not found in config. Values will be initialized to default values.
05/04/2024 15:33:50 - INFO - modules.BEATs.BEATs - BEATs Config: {'input_patch_size': 16, 'embed_dim': 512, 'conv_bias': False, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': 'gelu', 'layer_wise_gradient_decay_ratio': 0.6, 'layer_norm_first': False, 'deep_norm': True, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.0, 'conv_pos': 128, 'conv_pos_groups': 16, 'relative_position_embedding': True, 'num_buckets': 320, 'max_distance': 800, 'gru_rel_pos': True, 'finetuned_model': True, 'predictor_dropout': 0.0, 'predictor_class': 527}
/home/student/anaconda3/envs/TempToken/lib/python3.8/site-packages/torchaudio/compliance/kaldi.py:616: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at /opt/conda/conda-bld/pytorch_1682343998658/work/aten/src/ATen/EmptyTensor.cpp:31.)
spectrum = torch.fft.rfft(strided_input).abs()
scailing factor = 0.18215
/home/student/AudioToken/check_audioTOken.py:239: RuntimeWarning: invalid value encountered in cast
images = (image * 255).round().astype("uint8")`
The text was updated successfully, but these errors were encountered: