锐英源软件
第一信赖

精通

英语

开源

擅长

开发

培训

胸怀四海 

第一信赖

当前位置:锐英源 / 开源技术 / 语音识别开源 / online2-wav-nnet2-latgen-faster解码wav
服务方向
人工智能数据处理
人工智能培训
kaldi数据准备
小语种语音识别
语音识别标注
语音识别系统
语音识别转文字
kaldi开发技术服务
软件开发
运动控制卡上位机
机械加工软件
软件开发培训
Java 安卓移动开发
VC++
C#软件
汇编和破解
驱动开发
联系方式
固话:0371-63888850
手机:138-0381-0136
Q Q:396806883
微信:ryysoft

online2-wav-nnet2-latgen-faster解码wav


I trained a DNN model using wsj/s5/local/nnet2/run_nnet2.sh. In decoding, the log file shows我使用wsj / s5 / local / nnet2 / run_nnet2.sh训练了DNN模型。在解码时,日志文件显示

nnet-latgen-faster --minimize=false --max-active=7000 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/tri4b/graph_tglarge/words.txt exp/nnet2_gpu/final.mdl exp/tri4b/graph_tglarge/HCLG.fst "ark,s,cs:apply-cmvn --utt2spk=ark:data/test/split20/1/utt2spk scp:data/test/split20/1/cmvn.scp scp:data/test/split20/1/feats.scp ark:- | splice-feats --left-context=3 --right-context=3 ark:- ark:- | transform-feats exp/nnet2_gpu/final.mat ark:- ark:- | transform-feats --utt2spk=ark:data/test/split20/1/utt2spk ark:exp/tri4b/decode_tglarge_test/trans.1 ark:- ark:- |" "ark:|gzip -c > exp/nnet2_gpu/decode_tglarge_test/lat.1.gz"

Then I tried to implement a decoder to decode wav file directly, so I created a new directory data/test_online, and only copied spk2utt, text, utt2spk, and wav.scp to there from data/test/.然后,我尝试实现一个解码器来直接解码wav文件,因此我创建了一个新目录data / test_online,并且仅将spk2utt,text,utt2spk和wav.scp从data / test /复制到了该目录。

Prepared the online directory using ./steps/online/nnet2/prepare_online_decoding.sh data/lang_test_tg exp/nnet2_gpu exp/nnet2_online使用
./steps/online/nnet2/prepare_online_decoding.sh data / lang_test_tg exp / nnet2_gpu exp / nnet2_online 准备在线目录

最后运行And finally run ./steps/online/nnet2/decode.sh --nj 20 --cmd utils/run.pl --online false exp/tri4b/graph_tglarge data/test_online exp/nnet2_online/decode_tg_test

However the run failed with the message in the log:但是,运行失败并在日志中显示以下消息:

online2-wav-nnet2-latgen-faster --online=false --do-endpointing=false --config=exp/nnet2_online/conf/online_nnet2_decoding.conf --max-active=7000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=0.1 --word-symbol-table=exp/tri4b/graph_tglarge/words.txt exp/nnet2_online/final.mdl exp/tri4b/graph_tglarge/HCLG.fst ark:data/test_online/split20/1/spk2utt 'ark,s,cs:wav-copy scp,p:data/test_online/split20/1/wav.scp ark:- |' 'ark:|gzip -c > exp/nnet2_online/decode_tg_test/lat.1.gz'
wav-copy scp,p:data/test_online/split20/1/wav.scp ark:-
ERROR (online2-wav-nnet2-latgen-faster:NnetComputer():nnet-compute.cc:70) Feature dimension is 13 but network expects 40

I'd appreciate any help to get the feature dimension correct.

错误(online2-wav-nnet2-latgen -faster:NnetComputer():nnet-compute.cc:70)功能长度为13,但网络预计为40

感谢您对正确设置特征尺寸的任何帮助。

I think you need to supply the --mfcc-config option to prepare_online_decoding.sh; you'll see in the example scripts how to do this.我认为您需要为
prepare_online_decoding.sh 提供--mfcc-config选项;您将在示例脚本中看到如何执行
此操作。

 

I think I have the --mfcc-config option because 'ls exp/nnet2_online/conf/' gave me the follows:
mfcc.conf online_nnet2_decoding.conf

This mfcc.conf was copied from conf/mfcc.conf, which has one line in it:
--use-energy=false # only non-default option.

Any other thoughts?

我认为我具有--mfcc-config选项,因为“ ls exp / nnet2_online / conf /”为我提供了以下内容:
mfcc.conf online_nnet2_decoding.conf

该mfcc.conf是从conf / mfcc.conf复制而来的,其中conf / mfcc.conf中只有一行:
-- use -energy = false#仅非默认选项。


The mfcc.conf defaults to conf/mfcc.conf, but the 40-dimensional MFCCs were generated from a different config, something like mfcc_hires.conf.mfcc.conf的默认值为conf / mfcc.conf,但是长度40的MFCC是 通过不同的配置生成的,例如mfcc_hires.conf。

There is no crash so far after I switched to different mfcc config. However, the decoding results are very bad from the log, outputting either nothing or just a single word 'TO'. Which steps I'm missing? Any options or any parameters I'm using incorrectly?切换到其他mfcc配置后,到目前为止没有崩溃。但是,从日志来看,解码结果非常糟糕,什么也不输出,或者只输出一个单词“ TO”。

我缺少哪些步骤?我使用不正确的任何选项或参数?

It could be quite a few things- e.g. a mismatched graph, or some other config mismatch.
I suggest, instead of doing it the way you did it, follow the example scripts more closely. That is, look at one of the decoding parts of the example scripts that decodes the WSJ data using
steps/online/nnet2/decode.sh (i.e. using the online2-wav-nnet2-* decoding
program), and then just swap out the data directory for your data directory. Then there is less potential for error.

可能有很多事情-例如图形不匹配,或其他一些配置不匹配。
我建议不要像以前那样去做,而应更紧密地遵循示例脚本。也就是说,请查看 示例脚本的解码部分之一,该脚本使用
steps / online / nnet2 / decode.sh(即,使用online2-wav-nnet2- *解码程序)对WSJ数据进行解码,然后换出数据 目录。这样就减少了出错的可能性。

I tried steps/online/nnet2/decode.sh with the following command line:
./steps/online/nnet2/decode.sh --nj 20 --cmd utils/run.pl --online false exp/tri4b/graph_tglarge data/test_online exp/nnet2_online2/decode_tg_test

It still gave me the same results. I suspect it might be the feature mismatch because the DNN built from the fMLLR transformed feature in 'tri4b'. My online_nnet2_decoding.conf contains as follows
--feature-type=mfcc
--mfcc-config=conf/mfcc.conf
--endpoint.silence-phones=1:2:3:4:5:6:7:8:9:10:11:12:13:14:15

Any suggestions? Thanks a lot.

我使用以下命令行尝试了steps / online / nnet2 / decode.sh:./steps/online/nnet2/decode.sh
--nj 20 --cmd utils / run.pl --online false exp / tri4b / graph_tg大数据/ test_online exp / nnet2_online2 / decode_tg_test

它仍然给我相同的结果。我怀疑这可能是功能不匹配,因为DNN是从'tri4b'中的fMLLR转换功能构建而来。我的online_nnet2_decoding.conf包含以下内容
--feature-type = mfcc
--mfcc-config = conf / mfcc.conf
--endpoint.silence-phones = 1:2:3:4:5:6:7:8:9 :10:11:12:13:14:15

有什么建议?非常感谢。

If the DNN is built from the fMLLR transformed feature, there is no way this will work.
The online decoding setup is intended for models built specifically for it, using the scripts in various setups found in local/online/run_nnet2.sh.如果DNN是通过fMLLR转换功能构建的,则将无法正常工作。 使用本地/在线/run_nnet2.sh中各种设置中的脚本, 在线解码设置适用于专门为其构建的模型

I looked local/online/run_nnet2.sh along with prepare_online_decoding.sh. It prepares configurations for mfcc, plp, ivector, etc. However my DNN model was trained using ‘steps/nnet2/train_pnorm_fast.sh’ in the receipt wsj/s5/local/nnet2/run_5d.sh.

Can I use online2-wav-nnet2-latgen-faster to decode the DNN? Do I have to rebuild a DNN using the online scripts in order to use the online decoding?

我与prepare_online_decoding.sh一起查看了local / online / run_nnet2.sh。它为mfcc,plp,ivector等准备配置。但是,我的DNN模型是使用wsj / s5 / local / nnet2 / run_5d.sh中的“ steps / nnet2 / train_pnorm_fast.sh”训练的。

我可以使用online2-wav-nnet2-latgen-faster解码DNN吗?为了使用在线解码,是否必须使用在线脚本重建DNN?

Yes, you have to build it using the online scripts if you want to do online decoding.
The problem is that the regular scripts use features that cannot be computed online in a straightforward way, for instance cepstral mean normalization and fMLLR adaptation.是的,如果要进行在线解码,则必须使用在线脚本进行构建。
问题在于常规脚本使用无法以直接方式在线计算的功能,例如倒谱均值归一化和fMLLR自适应。

The online ones are generally a bit better actually (although the setups are not exactly comparable and the offline ones could probably be made
better). However, the iVector based adaptation is not always robust to data that is too different from training. I am working on addressing this by excluding silence from the adaptation data, which seems to help.在线设置通常实际上要好一些(尽管设置不能完全媲美,而离线设置可能会更好)。但是,基于iVector的适应对于
与训练相差太大的数据并不总是很健壮。我正在通过从适应数据中排除沉默来解决这一问题,这似乎有所帮助。

友情链接
版权所有 Copyright(c)2004-2021 锐英源软件
公司注册号:410105000449586 豫ICP备08007559号 最佳分辨率 1024*768
地址:郑州大学北校区院(文化路97号院)内