Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ infer #5

Open
maozhiqiang opened this issue Mar 21, 2019 · 23 comments
Open

C++ infer #5

maozhiqiang opened this issue Mar 21, 2019 · 23 comments

Comments

@maozhiqiang
Copy link

maozhiqiang commented Mar 21, 2019

hi @geneing ! I used the C++ code for infer! but the speed is so slowly
mels shape: (80, 500)
take times:53.759968996047974
Seven seconds of audio takes about 53 seconds!
my hparams

rnn_dims=256,
fc_dims=128,
sparsity_target=0.90,
How to optimize the params! thanks!

@geneing
Copy link
Owner

geneing commented Mar 21, 2019

@maozhiqiang That's unexpectedly slow. On my computer (6 year old laptop) with the same hparams it runs a little slower than real time. Let's check a few things:

  1. Did you run training long enough to prune the weights? Current code should print how pruning is progressing.
  2. I'm assuming you ran convert_model to create a weight file.
  3. Have you compiled the library with optimization? Without at least "-O2" optimization level the code will run very slowly. Surprisingly, "-O3" optimization produces a library that runs ~30% slower than with O2. I get best results with "-O2 -ffast-math"

Sorry about the lack of detailed instructions. I'll get it done...

@maozhiqiang
Copy link
Author

maozhiqiang commented Mar 21, 2019

hi @geneing ! Thank you for your reply. the training log is
epoch:1801, running loss:211.44811499118805, average loss:1.5547655514057945, current lr:0.0002390525520373693, num_pruned:530824 (0.9%)
and used convert_model to convert pth to .bin model
I used cmake for complied, how to used -O2 for complie
thank your

@geneing
Copy link
Owner

geneing commented Mar 21, 2019

Run ccmake or cmake-gui . Switch to advanced mode ("t" in ccmake / a checkbox in cmake-gui). Find CMAKE_BUILD_TYPE entry and type RelWithDebInfo. Find CMAKE_CXX_FLAGS_RELWITHDEBINFO and edit to include -ffast-math flag.

You can also set build type from cmake command line: https://cmake.org/pipermail/cmake/2008-March/020347.html

@maozhiqiang
Copy link
Author

I just add
SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -g -Wall -O2")
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -Wall -O2")
to CmakeList.txt and recomplied
and now speed
mels shape: (80, 500)
5.093661546707153
Seven seconds of audio takes about 5 seconds!
Is this correct? thank you !

@geneing
Copy link
Owner

geneing commented Mar 21, 2019

Sounds right. "Eigen3" library that I use, employs every templating trick to get best performance. When optimized, it's performance is excellent. In debug mode it is super inefficient.

A few more flags to play with "-ffast-math" "-march=native".

@maozhiqiang
Copy link
Author

maozhiqiang commented Mar 21, 2019

thank you @geneing ! but The output is all noise, Is my place wrong!

@geneing
Copy link
Owner

geneing commented Mar 21, 2019

  1. Make sure that input mel is correct - e.g. use one of the training set inputs.
  2. It may be easier to debug using python library

sys.path.insert(0,'lib/build-src-RelDebInfo') #use correct path to the shared library WaveRNNVocoder...so

import WaveRNNVocoder
mel = np.load(fname).T #check that the first dimension is 80. Plot to check that it looks like correct mel
wav = vocoder.melToWav(mel) #plot to see what wav looks like

@maozhiqiang
Copy link
Author

thank you!

@maozhiqiang
Copy link
Author

maozhiqiang commented Mar 21, 2019

Hi @geneing !
I used synthesize.py , the outputs is normal,but using the test_wavernnvocoder.py has not worked!
sample :
test_0_mel.npy_orig.wav.zip
I don't know why?

@geneing
Copy link
Owner

geneing commented Mar 21, 2019

@maozhiqiang Could you, please, attach the mel data you are using as an input. Then I can try to reproduce your problem.

@maozhiqiang
Copy link
Author

@geneing thank you! test mels as follows
test_0_mel.npy.zip

@geneing
Copy link
Owner

geneing commented Mar 22, 2019

@maozhiqiang Works for me with b2f5fc1.
Screenshot_20190321_232140

test.zip

Commands:

import numpy as np
import librosa
import sys
sys.path.insert(0,'../WaveRNN-Pytorch/lib/build-src-RelDebInfo')

import WaveRNNVocoder
mel=np.load('eval/test_0_mel.npy')

vocoder=WaveRNNVocoder.Vocoder()
vocoder.loadWeights('../WaveRNN-Pytorch/model_outputs/model.bin')
wav=vocoder.melToWav(mel)
plot(wav)
librosa.output.write_wav('test.wav', wav, 16000)

The speech is a bit noisy and quiet. Cantonese?

@maozhiqiang
Copy link
Author

@geneing thank you! my result is also noise, I saw the same code! I don't know why!

@geneing
Copy link
Owner

geneing commented Mar 23, 2019

I'm not sure how to help you. Here's the weight file I'm using.

model.bin.zip

@maozhiqiang
Copy link
Author

thank you! I will try!

@alexdemartos
Copy link

Hi! Thank you for this awesome work!

I successfully trained the Pytorch WaveRNN model:

input_type='bits',
bits=10,
rnn_dims=800,
fc_dims=256,
pad=2,
upsample_factors=(4, 4, 16),
compute_dims=128,
res_out_dims=64*2,
res_blocks=10

Now I am trying to run inference in CPU using the C++ library. I compiled the library and run the convert_model.py, but when I try to run inference I get Aborted (core dumped).

If I use the model weights you shared in the above comment it runs perfectly fine.

Anything I might've missed here?

Thanks for your help :)

@geneing
Copy link
Owner

geneing commented May 3, 2019

@alexdemartos Would it be possible for you to obtain the stack trace when this error happens. You may have to recompile in debug mode and either run with gdb or open the core file. It will make it a lot easier to find the cause.

@alexdemartos
Copy link

alexdemartos commented May 4, 2019

Hi @geneing ,

thanks for your fast response. Sorry, I am not very experienced with C++ code debugging. I compiled the library in debug mode but I don't really know how to get any detailed info. This is the error from the .so library:

MemoryError: std::bad_alloc

I tried to debug with gdb and the vocoder binary, but it crashes when loading the mel (npy) file (even with your model):

(gdb) run -w model.bin -m mels.npy
Starting program: /home/ubuntu/git/WaveRNN-Pytorch/library/debug/vocoder -w model.bin -m mels.npy
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

Nevertheless, I noticed loading your model gives the following details:

Loading: model.bin
Loading:Conv1d(80, 64, kernel_size=(5,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn�
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn�
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn�
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn�
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn�
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn�
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn�
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,))
Loading:Stretch2d()
Loading:Stretch2d()
Loading:Conv2d(1, 1, kernel_size=(1, 9), stride=(1, 1), padding=(0, 4), �
Loading:Stretch2d()
Loading:Conv2d(1, 1, kernel_size=(1, 11), stride=(1, 1), padding=(0, 5),�
Loading:Stretch2d()
Loading:Conv2d(1, 1, kernel_size=(1, 21), stride=(1, 1), padding=(0, 10)�
Loading:Linear(in_features=112, out_features=128, bias=True)
Loading:GRU(128, 128, batch_first=True)
Loading:Linear(in_features=160, out_features=128, bias=True)
Loading:Linear(in_features=128, out_features=512, bias=True)

While loading mine, the last part of the model is not there:

Loading: checkpoints/model.bin
Loading:Conv1d(80, 128, kernel_size=(5,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA�
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA�
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA�
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA�
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA�
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA�
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA�
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA�
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA�
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA�
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA�
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA�
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA

@acrosson
Copy link

@geneing I tested out the model you have in model_outputs/model.bin.

It's taking me 9.5 seconds to generate 6 seconds of audio. Also the audio output quality is pretty poor
sample.wav

  1. What should I expect the performance to be if the model was converted to use cuda?
  2. How can I improve audio quality?

@geneing
Copy link
Owner

geneing commented May 16, 2019

@acrosson The network in this repo is designed for best performance on CPU - low op count, branching and memory access optimized for pipelined processors. For best performance on GPU you would use something like WaveGlow - no branching, massive op count amortized over thousands of simple compute cores.
On my dinky old laptop, I can synthesize 9.1 sec of audio in 8.1 seconds on a single cpu core. There are some opportunities for further optimization still. Either your computer is even slower than mine, or there is something suboptimal with code optimization (correct -O and -march flags are important).

For the sound quality, let's check if it's due to pruning. I observe that the quality drop with pruning is quite sharp past some "critical" pruning fraction. This "critical" fraction depends on the dataset used for training. When training with noisier datasets, I observe that I have to keep more weights after pruning to maintain sound quality.

If you go to your checkpoints/eval directory, you should have wav outputs every 10K steps or so. Listen to the output at around step 40000. If it sounds Ok, then check later steps. The step at which it sounds bad will tell you what fraction of the weights you can prune.

For training with https://github.com/mozilla/TTS/ I can prune up to 90% of the weights with little impact on quality. I applied a FIR filter with a window between 95 and 7600 Hz to the M-AILABS dataset I used for training.
Here's an example of speech synthesized from text: https://drive.google.com/open?id=1mrV_1RuKOyZxk4gp_81A7l9FmX9qhPAt

Here's one synthesized from mels: https://drive.google.com/open?id=1T-D3jHrI8tlb9EwohaAEdFP0ddwK7LfJ

@1105060120
Copy link

@maozhiqiang how to export to C++ inference,thank you

@LifaSun
Copy link

LifaSun commented Dec 6, 2019

Hi all,
When I run "python test_wavernnvocoder.py", I got this error:
ImportError: /WaveRNN-Pytorch/library/build/WaveRNNVocoder.so: undefined symbol: PyThread_tss_get
Is there anyone who can tell me how to fix it? Thank you very much!

@li-xx-5
Copy link

li-xx-5 commented May 24, 2021

@geneing hello, i use a hparams like belows,i can get a good result, but the inference time which i take is about 8s, is there any method can speed up inference, thank you.

model parameters

rnn_dims=400,
fc_dims=256,
pad=2,
upsample_factors=(4, 5, 10),
compute_dims=64,
res_out_dims=32*2, 
res_blocks=3,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants