Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redimnet configs for B0 and B1 models #400

Open
vandana-rajan opened this issue Jan 8, 2025 · 4 comments
Open

Redimnet configs for B0 and B1 models #400

vandana-rajan opened this issue Jan 8, 2025 · 4 comments

Comments

@vandana-rajan
Copy link

Hello,

The redimnet config you have uploaded is for redimnetb2 model. Could you please upload the configs for the b0 and b1 models that could replicate the results you obtained?

Thanks,
VM

@wsstriving
Copy link
Collaborator

Thank you for your attention! You can find all the configurations in model.py. Simply copy the relevant configurations into the config file to access all the other models.

@vandana-rajan
Copy link
Author

vandana-rajan commented Jan 10, 2025

Thank you for your response.

So, I just modified the redimnet.yaml like this, (feat_dim is 60 for RedimnetB0 in https://github.com/wenet-e2e/wespeaker/blob/master/wespeaker/models/redimnet.py#L874)

model: ReDimNetB0
model_args:
  embed_dim: 192
  feat_dim: 60
  pooling_func: ASTP
  two_emb_layer: false

However, I am getting an error

 File "/home/vandana/wespeaker/wespeaker/models/redimnet.py", line 786, in forward
  outputs_1d.append(self.run_stage(outputs_1d, stage_ind))
File "/home/vandana/wespeaker/wespeaker/models/redimnet.py", line 778, in run_stage
  x = self.to2d(x, c, f)
File "/home/vandana/wespeaker/wespeaker/models/redimnet.py", line 765, in to2d
  return x.reshape((bs, f, c, t)).permute((0, 2, 1, 3))
RuntimeError: shape '[256, 60, 10, 200]' is invalid for input of size 36864000

Do I need to change anything else in the config file?

@vandana-rajan
Copy link
Author

Update:

I changed the num_mel_bins to 60 and it has started training. I also had to reduce batch_size to 64 due to resource constraints. Lets see how this goes.

Thanks.

@vandana-rajan
Copy link
Author

So, I am getting very high EER values for the B0 model. There is something wrong and I am not sure what. This is my config file.

data_type: shard
dataloader_args:
  batch_size: 32
  drop_last: true
  num_workers: 4
  pin_memory: false
  prefetch_factor: 4
dataset_args:
  aug_prob: 0.6
  fbank_args:
    dither: 1.0
    frame_length: 25
    frame_shift: 10
    num_mel_bins: 72
  filter: true
  filter_args:
    max_num_frames: 800
    min_num_frames: 100
  num_frms: 200
  resample_rate: 16000
  sample_num_per_epoch: 0
  shuffle: true
  shuffle_args:
    shuffle_size: 2500
  spec_aug: false
  spec_aug_args:
    max_f: 8
    max_t: 10
    num_f_mask: 1
    num_t_mask: 1
    prob: 0.6
  speed_perturb: true
enable_amp: false
exp_dir: ./redimnet_b0_train_new
gpus:
- 0
- 1
- 2
- 3
- 4
- 5
log_batch_interval: 100
loss: CrossEntropyLoss
loss_args: {}
margin_scheduler: MarginScheduler
margin_update:
  epoch_iter: 5687
  final_margin: 0.2
  fix_start_epoch: 40
  increase_start_epoch: 20
  increase_type: exp
  initial_margin: 0.0
  update_margin: true
model: ReDimNetB0
model_args:
  embed_dim: 192
  feat_dim: 72
  pooling_func: ASTP
  two_emb_layer: false
model_init: null
noise_data: data/musan/lmdb
num_avg: 10
num_epochs: 120
optimizer: SGD
optimizer_args:
  lr: 0.1
  momentum: 0.9
  nesterov: true
  weight_decay: 2.0e-05
projection_args:
  do_lm: false
  easy_margin: false
  embed_dim: 192
  num_class: 17982
  project_type: arc_margin
  scale: 32.0
reverb_data: data/rirs/lmdb
save_epoch_interval: 5
scheduler: ExponentialDecrease
scheduler_args:
  epoch_iter: 5687
  final_lr: 5.0e-05
  initial_lr: 0.1
  num_epochs: 120
  scale_ratio: 3.0
  warm_from_zero: true
  warm_up_epoch: 6
seed: 42
train_data: data/vox2_dev/shard.list
train_label: data/vox2_dev/utt2spk

I am doing model averaging of last 10 epochs. The scores I am getting are

---- vox1_O_cleaned.kaldi.score -----
EER = 3.264
minDCF (p_target:0.01 c_miss:1 c_fa:1) = 0.357
---- vox1_E_cleaned.kaldi.score -----
EER = 3.197
minDCF (p_target:0.01 c_miss:1 c_fa:1) = 0.339
---- vox1_H_cleaned.kaldi.score -----
EER = 5.369
minDCF (p_target:0.01 c_miss:1 c_fa:1) = 0.476

Any advice would be greatly appreciated.

Thanks
VM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants