Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[src] Lots of changes: first stab at kaldi10 (non-compatible version of kaldi) #3083

Open
wants to merge 176 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
176 commits
Select commit Hold shift + click to select a range
f5f02d1
[src] Lots of changes: first stab at kaldi10 (non-compatible version …
danpovey Mar 10, 2019
cc1d251
Merge master into kaldi10 (#3105)
desh2608 Mar 14, 2019
f28516a
[src] Add Vector strides, beginning draft of tensor stuff (#3120)
danpovey Mar 15, 2019
b9efc54
Merge with master branch
desh2608 Mar 15, 2019
f93749a
[src] More work on tensor library draft in kaldi10 (#3124)
danpovey Mar 17, 2019
a3eeb7c
merged 'master' into kaldi10 and resolved conflicts
desh2608 Mar 17, 2019
f59489f
Merge pull request #3129 from desh2608/kaldi10_new
danpovey Mar 17, 2019
21a3913
Kaldi10 (#3131)
danpovey Mar 17, 2019
eca0e80
[src] More drafting of tensor related stuff (#3132)
danpovey Mar 18, 2019
63e35b0
[src] completed stride support for kaldi-vector (#3146)
YiwenShaoStephen Mar 20, 2019
c4a326e
Some cleanups in matrix/; more work on tensor draft (#3150)
danpovey Mar 20, 2019
4cab3db
Merged with 'master' (#3156)
desh2608 Mar 21, 2019
910ec50
[src] More tensor draft stuff; add simple test of vector stride
danpovey Mar 22, 2019
9bba411
Merge pull request #3161 from danpovey/kaldi10
danpovey Mar 22, 2019
5fa86ad
Kaldi10 (#3167)
danpovey Mar 26, 2019
7acef2a
Kaldi10: Implement topology (#3169)
hhadian Mar 26, 2019
26edaf6
Kaldi10: Add missing file (from PR #3169) + minor fixes (#3170)
hhadian Mar 26, 2019
5117c63
merged with master
desh2608 Mar 26, 2019
fe57b4a
Merge pull request #3173 from desh2608/kaldi10_merge
danpovey Mar 26, 2019
d6634f7
Implement most of transitins.cc (#3184)
hhadian Mar 29, 2019
20c73b5
[src] Kaldi10 changes: remove vector strides, more tensor progress. (…
danpovey Mar 31, 2019
885249a
[src] Kaldi10, more tensor progress (#3189)
danpovey Mar 31, 2019
e36034e
Implement the rest of transitions.cc + tests (#3198)
hhadian Apr 2, 2019
f8adced
Remove lingering HTK support fully. (#3201)
galv Apr 3, 2019
6d5c87b
Fix compilation of posterior.cc (#3200)
galv Apr 3, 2019
8fa9d18
Kaldi10: more tensor drafting. (#3246)
danpovey Apr 18, 2019
4a6b739
[src] Tensor progress
danpovey Apr 20, 2019
99873c6
[src] Further progress
danpovey Apr 22, 2019
493efff
[src] Lots more progress, still in flux
danpovey Apr 24, 2019
2a85204
[src,egs] Update code related to pdf-class to be 1-based. (#3278)
hhadian Apr 28, 2019
af6a30b
[src] More drafting on tensor code
danpovey Apr 30, 2019
3647d2d
[kaldi10] WIP hmm-utils.cc
galv Apr 16, 2019
f0e32f9
Clean up based on feedback.
galv Apr 17, 2019
a8b5580
hmm-utils.cc: Everything compiles except for AddSelfLoops.
galv Apr 18, 2019
af91c73
hmm-utils.cc compiles, except for a bizarre problem with the copy-ass…
galv Apr 18, 2019
25ebce5
Successfully compile all binaries other than cuda-gpu-available.cc
galv May 2, 2019
c0b6042
[src] Further progress
danpovey May 3, 2019
82ec8b7
Merge pull request #3290 from danpovey/kaldi10
danpovey May 3, 2019
24a85c9
[src] Further progress
danpovey May 4, 2019
d588595
Merge pull request #3292 from danpovey/kaldi10
danpovey May 4, 2019
3b2fee1
[src] further tensor progress
danpovey May 4, 2019
a751cf4
[src] Tensor progress; rename some files
May 5, 2019
670b2d5
[src] Small changes
May 5, 2019
cf95fe4
[src] Add definition
danpovey May 5, 2019
26121fc
Merge pull request #3294 from danpovey/kaldi10
danpovey May 5, 2019
5d5d387
[src] Various progress
danpovey May 12, 2019
f6e9281
[src] Small tensor changes prior to rewrite
danpovey May 17, 2019
931496f
[src] Some major changes in rough draft
danpovey May 24, 2019
3c3d9a9
[src] Lots of tensor changes
danpovey May 31, 2019
3dd5e7e
[src] Refactoring of matrix directory to separate out the cblas wrapp…
danpovey Jun 17, 2019
b254c83
[src] Add more things to cblasext
danpovey Jun 17, 2019
f566e81
[scripts] Fix non-randomness in getting utt2uniq, introduced in #3142…
desh2608 Mar 27, 2019
560594e
[build] Don't build for Tegra sm_XX versions on x86/ppc and vice vers…
luitjens Mar 27, 2019
4264512
[egs] Fixes Re encoding to IAM, uw3 recipes (#3012)
aarora8 Mar 29, 2019
6787282
[src] Efficiency improvement and extra checking for cudamarix, RE def…
luitjens Mar 30, 2019
8a1acde
[egs] Fix small typo in tedlium download script (#3178)
Shujian2015 Mar 30, 2019
5f00d0d
[github] Add GitHub issue templates (#3187)
Mar 31, 2019
6e998a9
[build] Add missing dependency to Makefile (#3191)
danpovey Mar 31, 2019
bf0af1d
[src] Fix bug in pruned lattice rescoring when input lattice has epsi…
hainan-xv Apr 1, 2019
7371a95
[scripts] Fix bug in extend_lang.sh regarding extra_disambig.txt (#3195)
armusc Apr 2, 2019
32496b4
[egs] Update Tedlium s5_r3 example with more up-to-date chain TDNN co…
jyhnnhyj Apr 3, 2019
43ba4f2
[scripts] Fix bug in extend_lang.sh causing validation failure w/ ext…
jty016 Apr 3, 2019
c737d94
[scripts] Bug-fix in make_lexicon_fst.py, which failed when --sil-pro…
armusc Apr 4, 2019
57d63cc
[egs] Fix very small typo in run_tdnn_1b.sh (#3207)
Shujian2015 Apr 4, 2019
9393b66
[build] Tensorflow version update (#3204)
langep Apr 4, 2019
4efc486
[src] Optimizations to CUDA kernels (#3209)
kangshiyin Apr 6, 2019
59523dc
[src] Move curand handle out of CuRand class and into CuDevice. (#3196)
luitjens Apr 7, 2019
da729a5
[build] Make MKL the default BLAS library, add installation scripts (…
Apr 7, 2019
c8ada0c
[build] check for i686 as a valid prefix for Android triplets (#3213)
Dr-Desty-Nova Apr 7, 2019
e1ac00f
[build] Fix configure breakage from #3194 (MKL default)
Apr 9, 2019
c54b5e5
[build] Add missing line continuation '\' in tfrnnlmbin/Makefile (#3218)
teinhonglo Apr 10, 2019
519493f
[src] Fix nnet2 DctComponent test failure (#3225)
huangruizhe Apr 12, 2019
d7685cb
[src] Update CUDA code to avoid synchronization errors on compute ca…
kangshiyin Apr 12, 2019
cbdb930
[src] fix nnet2 DCTCompnent test failure -- removing anther dct_keep_…
huangruizhe Apr 12, 2019
d22530f
[build] Remove references to deprecated MKL libs in gst_plugin (#3229)
Apr 14, 2019
e0cce5b
[scripts] Fix default params in nnet3 segmentation script (#3230)
rezame Apr 14, 2019
cbd1aa3
[src] Correct sanity check in nnet-example-utils.cc (nnet3) (#3232)
KarelVesely84 Apr 16, 2019
a12ee73
Revert "[src] Update CUDA code to avoid synchronization errors on co…
danpovey Apr 16, 2019
b2f9c54
[build] .gitignore autogenerated /tools/python/ (#3241)
mcalahan Apr 17, 2019
3642739
[scripts] Enhance argument checks in nnet3/align_lats.sh (#3243)
Apr 18, 2019
507145f
[egs] invoke 'python2.7' not 'python' when using mmseg (#3244)
naxingyu Apr 18, 2019
db8ed5b
[scripts] Make getting nnet3 model context more robust (#3247)
KarelVesely84 Apr 18, 2019
f8de5a8
[egs] Fix hkust_data_prep.sh w.r.t. iconv mac compatibility issue (#3…
zh794390558 Apr 19, 2019
68ad4e9
[egs] Update RM chain recipe with more recent configuration (#3237)
indra622 Apr 19, 2019
4831a66
[egs] Make voxceleb recipe work with latest version of the dataset (…
sunshines14 Apr 19, 2019
0534e49
[egs] Improve chain example script for Resource Management (RM) (#3252)
indra622 Apr 21, 2019
db2ed32
[src] GPU-related changes for speed and correctness on newer arch's. …
luitjens Apr 22, 2019
479c732
[egs] Update voxceleb v1 preparation scripts (#3255)
jyhnnhyj Apr 23, 2019
8c197b4
[build] Note default=MKL; cosmetic fix (#3257)
nshmyrev Apr 23, 2019
56dc8d9
[egs] Fix to hkust_data_prep.sh w.r.t. how mmseg is checked for (#3240)
zh794390558 Apr 23, 2019
16c9270
[egs] In WSJ run_ivector_common.sh, expose i-vector #jobs config to r…
KarelVesely84 Apr 23, 2019
57205cf
[egs] Add Spanish dimex100 example (#3254)
alx741 Apr 23, 2019
a756df2
[build] Build and configure OpenBLAS; default to it on non-x64 machin…
Apr 25, 2019
121dbbe
[scripts] Fix of a bug in segmentation.pl (#3256)
songyf Apr 25, 2019
a0b6f3f
[src] Fixes to cuda unit tests. (#3268)
luitjens Apr 25, 2019
c415cba
[src] Adding GPU/CUDA lattice batched decoder + binary (#3114)
hugovbraun Apr 26, 2019
4231107
[src] Fix unit-test failure UnitTestCuMatrixSetRandn() (#3274)
DongjiGao Apr 27, 2019
25c7289
[src,build] Removed cusolver for now (not needed yet; caused build p…
huangruizhe Apr 27, 2019
e3abc65
[scripts] Make fix_data_dir.sh remove utterances which have bad durat…
hhadian Apr 30, 2019
7a93e7f
[scripts] Make generate_plots.py python3-compatible (#3280)
May 1, 2019
c9a1257
[scripts] Add --one-based option to split_scp.pl (#3279)
xsawyerx May 1, 2019
aae8be4
[scripts] Allow UTF utterance-ids by removing unnecessary assert (#3283)
rezame May 1, 2019
803e3ee
[src] Keep nnet output in the [-30,30] range required by chain denomi…
danpovey May 2, 2019
b44f708
[scripts] Clean up filehandle usage in split_scp.pl (#3285)
xsawyerx May 2, 2019
7055784
[src] Fix to bug in online-feature.cc that caused crash at end of utt…
danpovey May 2, 2019
1bcea23
[scripts] Use correct compile-time regex syntax in split_scp.pl (#3287)
xsawyerx May 2, 2019
bfbe861
[scripts] Fix a typo in steps/dict/learn_lexicon_bayesian.sh (#3288)
xiaohui-zhang May 2, 2019
61b2347
[egs,scripts] Scripts and an example of BPE-based sub-word decoding (…
DongjiGao May 5, 2019
49bccbb
[scripts] Add trainer option --trainer.optimization.num-jobs-step (#3…
May 7, 2019
8209d18
[egs] Add MGB-5 recipe; https://arabicspeech.org/mgb5 (#3299)
May 8, 2019
5fbc9eb
Revert "[scripts] Clean up filehandle usage in split_scp.pl (#3285)" …
danpovey May 9, 2019
b78d92e
[src] Fix bug in GeneralMatrix::Uncompress() (#3304)
bringtree May 9, 2019
fee2acd
[doc] add an omission in Doxyfile (#3309)
May 10, 2019
de81d0c
[scripts] Fix utils/split_scp.pl breakage (#3308)
May 10, 2019
3453b5a
[egs] Bug-fix to shebang in fisher_callhome_spanish (#3312)
saikiranvalluri May 11, 2019
5ca7f58
[scripts] Fix error messages in run.pl (#3314)
May 11, 2019
e2dc9c3
[egs] New chime-5 recipe (#2893)
vimalmanohar May 12, 2019
e330320
[scripts,egs] Made changes to the augmentation script to make it work…
phanisankar-nidadavolu May 13, 2019
2826b35
[egs] updated local/musan.sh to steps/data/make_musan.sh in speaker i…
phanisankar-nidadavolu May 13, 2019
c695bbc
[src] Fix sample rounding errors in extract-segments (#3321)
May 14, 2019
cfa48eb
[src,scripts]Store frame_shift, utt2{dur,num_frames}, .conf with feat…
May 14, 2019
a1343bd
[build] Initial version of Docker images for (CPU and GPU versions) (…
mdoulaty May 15, 2019
9569384
[scripts] fix typo/bug in make_musan.py (#3327)
wonkyuml May 15, 2019
94aef8d
[scripts] Trust frame_shift and utt2num_frames if found (#3313)
May 16, 2019
9ae4a5c
[scripts] typo fix in augmentation script (#3329)
wonkyuml May 16, 2019
74ebdee
[scripts] handle frame_shit and utt2num_frames in utils/ (#3323)
May 16, 2019
d1c49bf
[scripts] Extend combine_ali_dirs.sh to combine alignment lattices (#…
May 17, 2019
bcfcad7
[src] Fix rare case when segment end rounding overshoots file end in …
alumae May 17, 2019
264372c
[scripts] Change --modify-spk-id default to False; back-compatibility…
phanisankar-nidadavolu May 20, 2019
485c248
[build] Add easier configure option in failure message of configure (…
danpovey May 20, 2019
e3ece34
[scripts,minor] Fix typo in comment (#3338)
Shujian2015 May 22, 2019
d03c16e
[src,egs] Add option for applying SVD on trained models (#3272)
saikiranvalluri May 23, 2019
33a16d8
[src] Add interfaces to nnet-batch-compute that expects device input.…
luitjens May 23, 2019
1e8260b
[build] Update GCC support check for CUDA toolkit 10.1 (#3345)
entn-at May 27, 2019
10bb5de
[egs] Fix to aishell1 v1 download script (#3344)
naxingyu May 27, 2019
d8d3b86
[scripts] Support utf-8 files in some scripts (#3346)
vimalmanohar May 28, 2019
75a69d9
[scripts]: add warning to nnet3/chain/train.py about ineffective opti…
bringtree May 28, 2019
448c876
[src] Misc tensor progress
danpovey Jun 3, 2019
5937fae
[src] small change
danpovey Jun 5, 2019
602ae12
[src] tensor progress
danpovey Jun 10, 2019
32101ba
[src] Change name from kGpuDevice to kCudaDevice
danpovey Jun 10, 2019
33c36bb
[src] More tensor progress
danpovey Jun 14, 2019
b247f30
[src] Progress on standard cuda kernels for tensor directory
danpovey Jun 15, 2019
553f4a8
[src] TEnsor progress.
danpovey Jun 19, 2019
c188496
[src] Merge upstream (may be other merges going on here too.)
danpovey Jun 19, 2019
689a42c
Merge branch 'kaldi10-hmm-utils' of https://github.com/galv/kaldi int…
danpovey Jun 19, 2019
935b151
[src] Minor changes / fixes
danpovey Jun 19, 2019
aa499a3
Merge remote-tracking branch 'origin/kaldi10-temp' into kaldi10
danpovey Jun 19, 2019
1b4dec7
[build] Add missing Makefile
danpovey Jun 19, 2019
c349ef5
[src] Changes to make more things compile
danpovey Jun 20, 2019
ebc6f83
[src] Partial changes to cudafeat, giving up for now
danpovey Jun 20, 2019
5b0c098
[src] Various changes to get it to compile
danpovey Jun 22, 2019
038ea06
[src] Bug-fixes/rewrites to fix test failures in hmm-utils-test
danpovey Jun 22, 2019
42942f5
[src] Various changes to make test pass
danpovey Jun 23, 2019
9dd4f63
[src] One last fix to make tests pass
danpovey Jun 23, 2019
a9c96f6
[src] Changing numbering of pattern preconditions
danpovey Jul 3, 2019
57a8d0e
[scripts,egs] Removing no-longer-existing options like --transition-s…
danpovey Jul 10, 2019
f4b8f53
[src] Various fixes
danpovey Jul 10, 2019
d91a020
[src] Add back lattice-add-trans-probs
danpovey Jul 10, 2019
10400f4
[src,scripts] Various fixes related to kaldi10 topo changes
danpovey Jul 11, 2019
10663ad
[src,scripts] Fixes to kaldi10 branch to make things work
danpovey Jul 14, 2019
252fedf
[build] Add missing SUBDIR to Makefile (#3466)
naxingyu Jul 15, 2019
bf16577
[src] restore cuda-compiled to kaldi10 (#3471)
naxingyu Jul 17, 2019
5a30b71
Add feature transform; remove train transition (#3474)
naxingyu Jul 18, 2019
db6b23d
[src] Fixes RE unusual topologies
danpovey Jul 18, 2019
1cbb691
[src] Fixes RE unusual topologies (#3478)
danpovey Jul 18, 2019
6ba25b5
[src] Fixes RE unusual topologies (#3481)
danpovey Jul 19, 2019
7a53503
[src] Fixes RE unusual topologies (#3480)
danpovey Jul 22, 2019
3a1e523
Kaldi10 feature-changes + attention/transformer scripts (#3562)
danpovey Sep 3, 2019
5514685
Merged with master but have not cleaned up its effects yet.
danpovey Sep 30, 2019
d844498
Properly merge online2bin dir from master (previously accidentally lo…
danpovey Sep 30, 2019
85cbf75
Add link kaldi->src, will eventually move the dir to be named 'kaldi'
danpovey Oct 1, 2019
312e687
Various kaldi10 fixes after merge
danpovey Oct 2, 2019
234d00d
Merge some previous fixes in (CAUTION: there was something about kNoL…
danpovey Oct 2, 2019
4b8bab0
[src] Clarification in comment
danpovey Oct 2, 2019
b322275
Fix some comments
danpovey Dec 11, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ if [ $stage -le 0 ]; then
fi
utils/data/get_uniform_subsegments.py \
--max-segment-duration=$window \
--overlap-duration=$(echo "$window-$period" | bc) \
--overlap-duration=$(perl -e "print ($window-$period);") \
--max-remaining-duration=$min_segment \
--constant-duration=True \
$segments > $dir/subsegments
Expand Down
2 changes: 1 addition & 1 deletion egs/callhome_diarization/v1/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ if [ $stage -le 6 ]; then

der=$(grep -oP 'DIARIZATION\ ERROR\ =\ \K[0-9]+([.][0-9]+)?' \
exp/tuning/${dataset}_t${threshold})
if [ $(echo $der'<'$best_der | bc -l) -eq 1 ]; then
if [ $(perl -e "print ($der < $best_der ? 1 : 0);") -eq 1 ]; then
best_der=$der
best_threshold=$threshold
fi
Expand Down
2 changes: 1 addition & 1 deletion egs/callhome_diarization/v2/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@ if [ $stage -le 10 ]; then

der=$(grep -oP 'DIARIZATION\ ERROR\ =\ \K[0-9]+([.][0-9]+)?' \
$nnet_dir/tuning/${dataset}_t${threshold})
if [ $(echo $der'<'$best_der | bc -l) -eq 1 ]; then
if [ $(perl -e "print ($der < $best_der ? 1 : 0);") -eq 1 ]; then
best_der=$der
best_threshold=$threshold
fi
Expand Down
2 changes: 1 addition & 1 deletion egs/dihard_2018/v1/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ if [ $stage -le 7 ]; then

der=$(grep -oP 'DIARIZATION\ ERROR\ =\ \K[0-9]+([.][0-9]+)?' \
$ivec_dir/tuning/dihard_2018_dev_t${threshold})
if [ $(echo $der'<'$best_der | bc -l) -eq 1 ]; then
if [ $(perl -e "print ($der < $best_der ? 1 : 0);") -eq 1 ]; then
best_der=$der
best_threshold=$threshold
fi
Expand Down
2 changes: 1 addition & 1 deletion egs/dihard_2018/v2/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ if [ $stage -le 12 ]; then

der=$(grep -oP 'DIARIZATION\ ERROR\ =\ \K[0-9]+([.][0-9]+)?' \
$nnet_dir/tuning/dihard_2018_dev_t${threshold})
if [ $(echo $der'<'$best_der | bc -l) -eq 1 ]; then
if [ $(perl -e "print ($der < $best_der ? 1 : 0);") -eq 1 ]; then
best_der=$der
best_threshold=$threshold
fi
Expand Down
2 changes: 1 addition & 1 deletion egs/rm/README.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ About the Resource Management corpus:

Each subdirectory of this directory contains the
scripts for a sequence of experiments.
s5 is the currently recommmended setup.
s5 is the currently recommended setup.

s5: This is the "new-new-style" recipe. It is now finished.
All further work will be on top of this style of recipe. Note:
Expand Down
4 changes: 2 additions & 2 deletions egs/sre08/v1/local/score_sre08.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,11 @@ tot_eer=0.0
printf '% 12s' 'EER:'
for condition in $(seq 8); do
eer=$(awk '{print $3}' $scores | paste - $trials | awk -v c=$condition '{n=4+c; if ($n == "Y") print $1, $4}' | compute-eer - 2>/dev/null)
tot_eer=$(echo "$tot_eer+$eer" | bc)
tot_eer=$(perl -e "print ($tot_eer+$eer);")
eers[$condition]=$eer
done

eers[0]=$(echo "$tot_eer/8" | bc -l)
eers[0]=$(perl -e "print ($tot_eer/8.0);")

for i in $(seq 0 8); do
printf '% 7.2f' ${eers[$i]}
Expand Down
8 changes: 7 additions & 1 deletion egs/swbd/s5c/local/score_sclite_conf.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,12 @@ for f in $data/stm $data/glm $lang/words.txt $lang/phones/word_boundary.int \
[ ! -f $f ] && echo "$0: expecting file $f to exist" && exit 1;
done

if [ -f $dir/../frame_subsampling_factor ]; then
factor=$(cat $dir/../frame_subsampling_factor) || exit 1
frame_shift_opt="--frame-shift=0.0$factor"
echo "$0: $dir/../frame_subsampling_factor exists, using $frame_shift_opt"
fi

name=`basename $data`; # e.g. eval2000

mkdir -p $dir/scoring/log
Expand All @@ -51,7 +57,7 @@ if [ $stage -le 0 ]; then
ACWT=\`perl -e \"print 1.0/LMWT\;\"\` '&&' \
lattice-add-penalty --word-ins-penalty=$wip "ark:gunzip -c $dir/lat.*.gz|" ark:- \| \
lattice-align-words $lang/phones/word_boundary.int $model ark:- ark:- \| \
lattice-to-ctm-conf --decode-mbr=$decode_mbr --acoustic-scale=\$ACWT ark:- - \| \
lattice-to-ctm-conf $frame_shift_opt --decode-mbr=$decode_mbr --acoustic-scale=\$ACWT ark:- - \| \
utils/int2sym.pl -f 5 $lang/words.txt \| \
utils/convert_ctm.pl $data/segments $data/reco2file_and_channel \
'>' $dir/score_LMWT_${wip}/$name.ctm || exit 1;
Expand Down
2 changes: 1 addition & 1 deletion egs/wsj/s5/local/chain/tuning/run_tdnn_1g.sh
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ if [ $stage -le 15 ]; then
echo "$0: creating neural net configs using the xconfig parser";

num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
learning_rate_factor=$(echo "print(0.5/$xent_regularize)" | python)
tdnn_opts="l2-regularize=0.01 dropout-proportion=0.0 dropout-per-dim-continuous=true"
tdnnf_opts="l2-regularize=0.01 dropout-proportion=0.0 bypass-scale=0.66"
linear_opts="l2-regularize=0.01 orthonormal-constraint=-1.0"
Expand Down
20 changes: 19 additions & 1 deletion egs/wsj/s5/steps/libs/nnet3/train/dropout_schedule.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,9 +186,22 @@ def _get_component_dropout(dropout_schedule, data_fraction):

def _get_dropout_proportions(dropout_schedule, data_fraction):
"""Returns dropout proportions based on the dropout_schedule for the
fraction of data seen at this stage of training.
fraction of data seen at this stage of training. Returns a list of
pairs (pattern, dropout_proportion); for instance, it might return
the list ['*', 0.625] meaning a dropout proportion of 0.625 is to
be applied to all dropout components.

Returns None if dropout_schedule is None.

dropout_schedule might be (in the sample case using the default pattern of
'*'): '0.1,[email protected],0.1', meaning a piecewise linear function that starts at
0.1 when data_fraction=0.0, rises to 0.5 when data_fraction=0.5, and falls
again to 0.1 when data_fraction=1.0. It can also contain space-separated
items of the form 'pattern=schedule', for instance:
'*=0.0,0.5,0.0 lstm.*=0.0,[email protected],0.0'
The more specific patterns should go later, otherwise they will be overridden
by the less specific patterns' commands.

Calls _get_component_dropout() for the different component name patterns
in dropout_schedule.

Expand All @@ -198,6 +211,7 @@ def _get_dropout_proportions(dropout_schedule, data_fraction):
See _self_test() for examples.
data_fraction: The fraction of data seen until this stage of
training.

"""
if dropout_schedule is None:
return None
Expand All @@ -213,6 +227,10 @@ def _get_dropout_proportions(dropout_schedule, data_fraction):
def get_dropout_edit_string(dropout_schedule, data_fraction, iter_):
"""Return an nnet3-copy --edits line to modify raw_model_string to
set dropout proportions according to dropout_proportions.
E.g. if _dropout_proportions(dropout_schedule, data_fraction)
returns [('*', 0.625)], this will return the string:
"nnet3-copy --edits='set-dropout-proportion name=* proportion=0.625'"


Arguments:
dropout_schedule: Value for the --trainer.dropout-schedule option.
Expand Down
1 change: 1 addition & 0 deletions egs/wsj/s5/steps/libs/nnet3/xconfig/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
'relu-batchnorm-layer' : xlayers.XconfigBasicLayer,
'relu-batchnorm-so-layer' : xlayers.XconfigBasicLayer,
'batchnorm-so-relu-layer' : xlayers.XconfigBasicLayer,
'batchnorm-layer' : xlayers.XconfigBasicLayer,
'sigmoid-layer' : xlayers.XconfigBasicLayer,
'tanh-layer' : xlayers.XconfigBasicLayer,
'fixed-affine-layer' : xlayers.XconfigFixedAffineLayer,
Expand Down
12 changes: 0 additions & 12 deletions egs/wsj/s5/steps/nnet/train.sh
Original file line number Diff line number Diff line change
Expand Up @@ -433,18 +433,6 @@ else
${bn_dim:+ --bottleneck-dim=$bn_dim} \
"$cnn_fea" $num_tgt $hid_layers $hid_dim >>$nnet_proto
;;
cnn2d)
delta_order=$([ -z $delta_opts ] && echo "0" || { echo $delta_opts | tr ' ' '\n' | grep "delta[-_]order" | sed 's:^.*=::'; })
echo "Debug : $delta_opts, delta_order $delta_order"
utils/nnet/make_cnn2d_proto.py $cnn_proto_opts \
--splice=$splice --delta-order=$delta_order --dir=$dir \
$num_fea >$nnet_proto
cnn_fea=$(cat $nnet_proto | grep -v '^$' | tail -n1 | awk '{ print $5; }')
utils/nnet/make_nnet_proto.py $proto_opts \
--no-smaller-input-weights \
${bn_dim:+ --bottleneck-dim=$bn_dim} \
"$cnn_fea" $num_tgt $hid_layers $hid_dim >>$nnet_proto
;;
lstm)
utils/nnet/make_lstm_proto.py $proto_opts \
$num_fea $num_tgt >$nnet_proto
Expand Down
106 changes: 106 additions & 0 deletions egs/wsj/s5/steps/nnet3/xconfig_to_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
#!/usr/bin/env python3

# Copyright 2016-2018 Johns Hopkins University (Dan Povey)
# 2016 Vijayaditya Peddinti
# 2017 Google Inc. ([email protected])
# Apache 2.0.

# This is like xconfig_to_configs.py but with a simpler interface; it writes
# to a single named file.


import argparse
import os
import sys
from collections import defaultdict

sys.path.insert(0, 'steps/')
# the following is in case we weren't running this from the normal directory.
sys.path.insert(0, os.path.realpath(os.path.dirname(sys.argv[0])) + '/')

import libs.nnet3.xconfig.parser as xparser
import libs.common as common_lib


def get_args():
# we add compulsory arguments as named arguments for readability
parser = argparse.ArgumentParser(
description="Reads an xconfig file and creates config files "
"for neural net creation and training",
epilog='Search egs/*/*/local/{nnet3,chain}/*sh for examples')
parser.add_argument('--xconfig-file', required=True,
help='Filename of input xconfig file')
parser.add_argument('--existing-model',
help='Filename of previously trained neural net '
'(e.g. final.mdl) which is useful in case of '
'using nodes from list of component-nodes in '
'already trained model '
'to generate new config file for new model.'
'The context info is also generated using '
'a model generated by adding final.config '
'to the existing model.'
'e.g. In Transfer learning: generate new model using '
'component nodes in existing model.')
parser.add_argument('--config-file-out', required=True,
help='Filename to write nnet config file.');
parser.add_argument('--nnet-edits', type=str, default=None,
action=common_lib.NullstrToNoneAction,
help="""This option is useful in case the network you
are creating does not have an output node called
'output' (e.g. for multilingual setups). You can set
this to an edit-string like: 'rename-node old-name=xxx
new-name=output' if node xxx plays the role of the
output node in this network. This is only used for
computing the left/right context.""")

print(' '.join(sys.argv), file=sys.stderr)

args = parser.parse_args()

return args



def write_config_file(config_file_out, all_layers):
# config_basename_to_lines is map from the basename of the
# config, as a string (i.e. 'ref', 'all', 'init') to a list of
# strings representing lines to put in the config file.
config_basename_to_lines = defaultdict(list)

for layer in all_layers:
try:
pairs = layer.get_full_config()
for config_basename, line in pairs:
config_basename_to_lines[config_basename].append(line)
except Exception as e:
print("{0}: error producing config lines from xconfig "
"line '{1}': error was: {2}".format(sys.argv[0],
str(layer), repr(e)),
file=sys.stderr)
# we use raise rather than raise(e) as using a blank raise
# preserves the backtrace
raise

with open(config_file_out, 'w') as f:
print('# This file was created by the command:\n'
'# {0} '.format(sys.argv), file=f)
lines = config_basename_to_lines['final']
for line in lines:
print(line, file=f)


def main():
args = get_args()
existing_layers = []
if args.existing_model is not None:
existing_layers = xparser.get_model_component_info(args.existing_model)
all_layers = xparser.read_xconfig_file(args.xconfig_file, existing_layers)
write_config_file(args.config_file_out, all_layers)


if __name__ == '__main__':
main()


# test:
# (echo 'input dim=40 name=input'; echo 'output name=output input=Append(-1,0,1)') >xconfig; steps/nnet3/xconfig_to_config.py --xconfig-file=xconfig --config-file-out=foo
6 changes: 2 additions & 4 deletions egs/wsj/s5/steps/segmentation/internal/merge_targets.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env python
#!/usr/bin/env python3

# Copyright 2017 Vimal Manohar
# Apache 2.0
Expand All @@ -16,8 +16,6 @@
option.
"""

from __future__ import print_function
from __future__ import division
import argparse
import logging
import numpy as np
Expand Down Expand Up @@ -111,7 +109,7 @@ def should_remove_frame(row, dim):
# source[2] = [ 0 0 0 ]
"""
assert len(row) % dim == 0
num_sources = len(row) / dim
num_sources = len(row) // dim

max_idx = np.argmax(row)
max_val = row[max_idx]
Expand Down
6 changes: 3 additions & 3 deletions egs/wsj/s5/utils/data/perturb_data_dir_volume.sh
Original file line number Diff line number Diff line change
Expand Up @@ -52,15 +52,15 @@ for line in sys.stdin.readlines():
parts = line.strip().split()
if line.strip()[-1] == '|':
if re.search('sox --vol', ' '.join(parts[-11:])):
print 'true'
print('true')
sys.exit(0)
elif re.search(':[0-9]+$', line.strip()) is not None:
continue
else:
if ' '.join(parts[1:3]) == 'sox --vol':
print 'true'
print('true')
sys.exit(0)
print 'false'
print('false')
"` || exit 1

if $volume_perturb_done; then
Expand Down
11 changes: 4 additions & 7 deletions egs/wsj/s5/utils/nnet/gen_dct_mat.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@
# limitations under the License.

# ./gen_dct_mat.py
# script generates matrix with DCT transform, which is sparse
# and takes into account that data-layout is along frequency axis,
# script generates matrix with DCT transform, which is sparse
# and takes into account that data-layout is along frequency axis,
# while DCT is done along temporal axis.

from __future__ import division
Expand All @@ -29,10 +29,7 @@
from optparse import OptionParser

def print_on_same_line(text):
if (sys.version_info > (3,0)):
print(text, end=' ')
else:
print text,
print(text, end=' ')

parser = OptionParser()
parser.add_option('--fea-dim', dest='dim', help='feature dimension')
Expand Down Expand Up @@ -69,7 +66,7 @@ def print_on_same_line(text):
if(n==timeContext-1):
print_on_same_line((dim-m-1)*'0 ')
print()
print()
print()

print(']')

5 changes: 1 addition & 4 deletions egs/wsj/s5/utils/nnet/gen_hamm_mat.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,7 @@
from optparse import OptionParser

def print_on_same_line(text):
if (sys.version_info > (3,0)):
print(text, end=' ')
else:
print text,
print(text, end=' ')

parser = OptionParser()
parser.add_option('--fea-dim', dest='dim', help='feature dimension')
Expand Down
5 changes: 1 addition & 4 deletions egs/wsj/s5/utils/nnet/gen_splice.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,7 @@
from optparse import OptionParser

def print_on_same_line(text):
if (sys.version_info > (3,0)):
print(text, end=' ')
else:
print text,
print(text, end=' ')

parser = OptionParser()
parser.add_option('--fea-dim', dest='dim_in', help='feature dimension')
Expand Down
Loading