Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tdnn pool #3

Open
wants to merge 1,866 commits into
base: master
Choose a base branch
from
Open

Tdnn pool #3

wants to merge 1,866 commits into from

Conversation

vijayaditya
Copy link
Owner

No description provided.

vdp and others added 30 commits April 8, 2016 13:44
…ing gmm-latgen-faster

The binary complains that there is no such option and exits abnormally, which in turn causes
local/test_decoders.sh to fail with a scary error message.
rm/s5: Don't use the no-longer-existing '--max-arcs' option when call…
12 hours is the estimate I got when I ran the recipe:
```
root@27af4331c701:/opt/kaldi/egs/swbd/s5c/data/train_30kshort
$ awk '{s+=$4-$3}END{print s}' segments 
43022.7
```
(That's 11.95 hours.)
… with

online decoding (and to enable fix to --snip-edges=false bug).
…o write online feature-extraction code that would respect the snip-edges=false option.
- the stays same functionality, but now it is more 'correct', as we
  don't do the triple cast with 'bool'.
moving the src/path.sh into tools/config/common_path.sh
Fixing stuff and renaming into am_nnet.
WIP: fix for utt2dur when applied on speed-pertubed data
…es option; includes rewrite of window extraction code.
…the HUB4 corpus, update to local/make_bn.py so that it is more flexible with regard to differences in source directory format
smbr: Avoid extra epochs if frame shift is not used during training
KarelVesely84 and others added 28 commits May 18, 2016 20:38
…ming more generic,

- the binary can be replaced (so we could eventually append posteriors, features, etc.)
A new CUDA kernel for CuMatrixBase<Real>::FindRowMaxId;
base/kaldi_error : the error messages are no longer printed 2x
Add barrier for correct timing.

Original performance:
LOG (TestCuMatrixTransposeCross():cu-matrix-speed-test.cc:91) For CuMatrix::TransposeCross<float>, for dim = 1024, speed was 4.26727 gigaflops.
LOG (TestCuMatrixTransposeS():cu-matrix-speed-test.cc:72) For CuMatrix::TransposeS<float>, for dim = 1024, speed was 5.97203 gigaflops.
LOG (TestCuMatrixTransposeNS():cu-matrix-speed-test.cc:56) For CuMatrix::TransposeNS<float>, for dim = 1024, speed was 3.0816 gigaflops.
LOG (TestCuMatrixTransposeCross():cu-matrix-speed-test.cc:91) For CuMatrix::TransposeCross<double>, for dim = 1024, speed was 3.95059 gigaflops.
LOG (TestCuMatrixTransposeS():cu-matrix-speed-test.cc:72) For CuMatrix::TransposeS<double>, for dim = 1024, speed was 4.36189 gigaflops.
LOG (TestCuMatrixTransposeNS():cu-matrix-speed-test.cc:56) For CuMatrix::TransposeNS<double>, for dim = 1024, speed was 2.39275 gigaflops.
LOG (TestCuMatrixTransposeCross():cu-matrix-speed-test.cc:91) For CuMatrix::TransposeCross<float>, for dim = 1024, speed was 14.0498 gigaflops.
LOG (TestCuMatrixTransposeS():cu-matrix-speed-test.cc:72) For CuMatrix::TransposeS<float>, for dim = 1024, speed was 16.845 gigaflops.
LOG (TestCuMatrixTransposeNS():cu-matrix-speed-test.cc:56) For CuMatrix::TransposeNS<float>, for dim = 1024, speed was 14.2464 gigaflops.
LOG (TestCuMatrixTransposeCross():cu-matrix-speed-test.cc:91) For CuMatrix::TransposeCross<double>, for dim = 1024, speed was 10.4523 gigaflops.
LOG (TestCuMatrixTransposeS():cu-matrix-speed-test.cc:72) For CuMatrix::TransposeS<double>, for dim = 1024, speed was 9.65529 gigaflops.
LOG (TestCuMatrixTransposeNS():cu-matrix-speed-test.cc:56) For CuMatrix::TransposeNS<double>, for dim = 1024, speed was 8.52148 gigaflops.
add new results for Multi-splice version of online recipe of Librispeech, including those on test set.
…size.

New:
LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<float>, for dim = 1024, speed was 10.1076 gigaflops.
LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<float> [transposed], for dim = 1024, speed was 11.8711 gigaflops.
LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<double>, for dim = 1024, speed was 7.10019 gigaflops.
LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<double> [transposed], for dim = 1024, speed was 7.81977 gigaflops.

Old:
LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<float>, for dim = 1024, speed was 4.57783 gigaflops.
LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<float> [transposed], for dim = 1024, speed was 7.96795 gigaflops.
LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<double>, for dim = 1024, speed was 3.61182 gigaflops.
LOG (TestCuMatrixTraceMatMat():cu-matrix-speed-test.cc:458) For CuMatrix::TraceMatMat<double> [transposed], for dim = 1024, speed was 6.39571 gigaflops.
2 CUDA kernels for TraceMatMat with/without transpose for all matrix size.
smbr: Fixed minor bug in generating diagnostics egs
Speed up CuMatrix<Real>::Transpose() and transposed copy from matrix
some cosmetic changes: add comments to RNNLM rescoring utilities to r…
added utils/combine_ali_dirs.sh (fixes kaldi-asr#553).
$ ./configure --mkl-root=/opt/intel/mkl --static-math=yes
...
Configuring MKL library directory: Found: /opt/intel/mkl/lib/intel64
MKL configured with threading: sequential, libs:  -Wl,--start-group /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.a /opt/intel/mkl/lib/intel64/libmkl_core.a /opt/intel/mkl/lib/intel64/libmkl_sequential.a -Wl,--end-group
MKL include directory configured as: /opt/intel/mkl/include
Configuring MKL threading as sequential
MKL threading libraries configured as   -lpthread -lm
Using Intel MKL as the linear algebra library.
/opt/intel/mkl/lib/intel64/libmkl_core.a(mkl_memory_patched.o): In function `mkl_serv_set_memory_limit':
mkl_memory.c:(.text+0x49c): undefined reference to `dlsym'
mkl_memory.c:(.text+0x4b2): undefined reference to `dlsym'
mkl_memory.c:(.text+0x4c8): undefined reference to `dlsym'
/opt/intel/mkl/lib/intel64/libmkl_core.a(mkl_memory_patched.o): In function `mkl_serv_allocate':
mkl_memory.c:(.text+0x1251): undefined reference to `dlsym'
mkl_memory.c:(.text+0x1267): undefined reference to `dlsym'
...
Add missing dependencies to Makefiles
Add dimension check in online-nnet3 decoding code, so we get more mea…
Fix bug: static link to MKL 11.3.2 failed.
vijayaditya pushed a commit that referenced this pull request Nov 30, 2016
added the option trainer.deriv-truncate-margin to train_rnn.py and tr…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet