锐英源软件
第一信赖

精通

英语

开源

擅长

开发

培训

胸怀四海 

第一信赖

当前位置:锐英源 / 开源技术 / 语音识别开源 / nnet提升性能
服务方向
人工智能数据处理
人工智能培训
kaldi数据准备
小语种语音识别
语音识别标注
语音识别系统
语音识别转文字
kaldi开发技术服务
软件开发
运动控制卡上位机
机械加工软件
软件开发培训
Java 安卓移动开发
VC++
C#软件
汇编和破解
驱动开发
联系方式
固话:0371-63888850
手机:138-0381-0136
Q Q:396806883
微信:ryysoft

nnet提升性能


I'm just putting this here so it gets archived publicly.

Someone asked me "What are your near future plans regarding nnet2 improvements?
Do you plan to add pretraining stage like in Karel's setup?"

I answered:

There have been some recent improvements with the "p-norm" setup, it's a different nonlinearity. You'll see example scripts. And also of course we've enabled parallel training with GPUs. Now the cross-entropy trained numbers without p-norm are about the same as Karel's (should be a little better with p-norm), although the discriminatively trained numbers are still a little worse-- I need to do some experimentation on this and tune it a bit.
I don't plan to add pre-training because it is very limiting in terms of what you can do next; many types of nonlinearity make it difficult to pre-train (e.g. p-norm). Just yesterday I committed a speed improvement (for decoding without GPUs), regarding using floating-point exp when using floats.
In the near future, i.e. the next couple of months, I (and students working with me) plan to do experiments with multilingual training, bottleneck networks, and possibly convolutional nets.

我只是将其放在此处,以便将其公开介绍。

有人问我:“关于nnet2改进,您近期有什么计划?您打算像在Karel的设置中那样增加预培训阶段吗?”

我回答了:

“ p范数”设置最近有了一些改进,这是一种不同的非线性。您将看到示例脚本。当然,我们还启用了GPU的并行训练。现在,没有p范数的交叉熵训练数与Karel's差不多(使用p范数应该更好),尽管有区别的训练数仍然差一些-我需要对此做一些实验,调整一下。
我不打算添加预培训,因为这对您下一步的工作非常有限制。许多类型的非线性使得很难进行预训练(例如p范数)。就在昨天,我承诺在使用float时使用浮点exp来提高速度(用于不使用GPU的解码)。
在不久的将来,即接下来的几个月中,我(和与我一起工作的学生)计划进行多语言培训,瓶颈网络以及可能的卷积网络实验。

I have encountered a question which have blocked me for 2 days.
when I use train_mono.sh for callhk training, I use :
feats="ark,s,cs:apply-cmvn --norm-vars=false
--utt2spk=ark:$sdata/JOB/utt2spk scp:$sdata/JOB/cmvn.scp
scp:$sdata/JOB/feats.scp ark:- | add-deltas ark:- ark:- |"
I have done cmvn for every utterance and check the ids of text and
cmvn.scp with feats.scp
then I split the whole feats.scp text and cmvn.scp into ten pieces
into data/train/split10, such as : 1, 2, 3,...,10
but when I train the mono, I found the align log are like this:我遇到了一个问题,该问题已将我封锁了2天。
当我使用train_mono.sh进行callhk培训时,我使用:feats =“ ark,s,cs:apply-cmvn --norm-vars = false
--utt2spk = ark:$ sdata / JOB / utt2spk scp:$ sdata / JOB /cmvn.scp
scp:$ sdata / JOB / feats.scp ark:-| add-deltas ark:-ark:-|”
我已经为每个语音完成了cmvn并使用feats.scp检查了text和cmvn.scp的ID,
然后将整个feats.scp文本和cmvn.scp分为了十个部分,
分为data / train / split10,例如:1,2 ,3,...,10,
但是当我训练单声道时,我发现对齐日志如下:

WARNING (apply-cmvn:main():apply-cmvn.cc:67) No normalization
statistics available for key 20041013_1153
00_a012961_b012962_20041013_115300_a012961_b012962_b_0120_female,
producing no output for this utterance
WARNING (apply-cmvn:main():apply-cmvn.cc:67) No normalization
statistics available for key 20041013_1153
00_a012961_b012962_20041013_115300_a012961_b012962_b_0122_female,
producing no output for this utterance
WARNING (apply-cmvn:main():apply-cmvn.cc:67) No normalization
statistics available for key 20041013_1153
00_a012961_b012962_20041013_115300_a012961_b012962_b_0123_female,
producing no output for this utterance
WARNING (apply-cmvn:main():apply-cmvn.cc:67) No normalization
statistics available for key 20041013_1153
00_a012961_b012962_20041013_115300_a012961_b012962_b_0124_female,
producing no output for this utterance
WARNING (apply-cmvn:main():apply-cmvn.cc:67) No normalization
statistics available for key 20041013_1153
00_a012961_b012962_20041013_115300_a012961_b012962_b_0125_female,
producing no output for this utterance
WARNING (apply-cmvn:main():apply-cmvn.cc:67) No normalization
statistics available for key 20041013_1153
00_a012961_b012962_20041013_115300_a012961_b012962_b_0127_female,
producing no output for this utterance
WARNING (apply-cmvn:main():apply-cmvn.cc:67) No normalization
statistics available for key 20041013_1153
00_a012961_b012962_20041013_115300_a012961_b012962_b_0129_female,
producing no output for this utterance
WARNING (apply-cmvn:main():apply-cmvn.cc:67) No normalization
statistics available for key 20041013_1153

WARNING (gmm-align-compiled:main():gmm-align-compiled.cc:104) No
features for utterance
20040718_143050_a010556_b010557_20040718_143050_a010556_b010557_a_0111_male
WARNING (gmm-align-compiled:main():gmm-align-compiled.cc:104) No
features for utterance
20040718_143050_a010556_b010557_20040718_143050_a010556_b010557_a_0112_male
WARNING (gmm-align-compiled:main():gmm-align-compiled.cc:104) No
features for utterance
20040718_143050_a010556_b010557_20040718_143050_a010556_b010557_a_0113_male
WARNING (gmm-align-compiled:main():gmm-align-compiled.cc:104) No
features for utterance
20040718_143050_a010556_b010557_20040718_143050_a010556_b010557_a_0118_male
WARNING (gmm-align-compiled:main():gmm-align-compiled.cc:104) No
features for utterance
20040718_143050_a010556_b010557_20040718_143050_a010556_b010557_a_0134_male
WARNING (gmm-align-compiled:main():gmm-align-compiled.cc:104) No
features for utterance
20040718_143050_a010556_b010557_20040718_143050_a010556_b010557_a_0149_male
WARNING (gmm-align-compiled:main():gmm-align-compiled.cc:104) No
features for utterance
20040718_143050_a010556_b010557_20040718_143050_a010556_b010557_a_0150_male
WARNING (gmm-align-compiled:main():gmm-align-compiled.cc:104) No
features for utterance
20040718_143050_a010556_b010557_20040718_143050_a010556_b010557_b_0001_male
WARNING (gmm-align-compiled:main():gmm-align-compiled.cc:104) No
features for utterance
20040718_143050_a010556_b010557_20040718_143050_a010556_b010557_b_0002_male
WARNING (gmm-align-compiled:main():gmm-align-compiled.cc:104) No
features for utterance
20040718_143050_a010556_b010557_20040718_143050_a010556_b010557_b_0013_male
WARNING (gmm-align-compiled:main():gmm-align-compiled.cc:104) No
features for utterance
20040718_143050_a010556_b010557_20040718_143050_a010556_b010557_b_0026_male
WARNING (gmm-align-compiled:main():gmm-align-compiled.cc:104) No
features for utterance 20040718_143050_

I have checked the subset of cmvn.scp text and feats.scp
The ids of them are one to one correspondence, so I really donot know
what reason can cause the errors ?
this is very universal situation.

我已经检查了cmvn.scp文本和feats.scp的子集, 它们的ID是一对一的对应关系,所以我真的不知道 是什么原因会导致错误?
这是非常普遍的情况。 非常感谢你。

It looks like the problem is that Kaldi expects the keys in cmvn.scp to correspond to speakers, and yours correspond to utterances. If you were using the supplied scripts (steps/compute_cmvn_stats.sh) to compute CMVN, this would not happen.
You also could have found out that there was a problem with your data by using utils/validate_data_dir.sh to check your data directory.
If you want all adaptation to be done per utterance rather than per speaker, you have to make utt2spk and spk2utt be one-to-one maps, with
lines like
foo foo
bar bar
and so on.
BTW, I notice that your speaker-ids are not prefixes of your utterance-ids;
as mentioned in
http://kaldi.sourceforge.net/data_prep.html
it is better if they are. This could potentially cause problems with splitting later on.

看来问题在于Kaldi希望cmvn.scp中的键与扬声器相对应,而您的键与语音相对应。如果使用提供的脚本(steps / compute_cmvn_stats.sh)来计算CMVN,则不会发生这种情况。
您还可以通过使用utils / validate_data_dir.sh检查数据目录来发现数据有问题。
如果您希望所有更改都是按说话而不是每个说话者完成的,则必须使utt2spk和spk2utt成为一对一的映射,并使用 诸如
foo foo
bar bar之类的行来表示。
顺便说一句,我注意到您的说话者ID并非您话语ID的前缀;
如在
http://kaldi.sourceforge.net/data_prep.html更好。这可能会导致稍后拆分的问题。

Maybe I have grasped your ideas that I should make a map between speaker ids and the utterance ids, but they are actually the same
things.
So, I can use raw scripts : steps/compute_cmvn_stats.sh to compute the cmvn stats with any modification is it true?

也许我已经掌握了您的想法,我应该在说话者ID和发声ID之间建立一个映射,但实际上它们是相同的 。
因此,我可以使用原始脚本:steps / compute_cmvn_stats.sh进行任何修改来计算cmvn统计信息,这是真的吗?

I means I should make utt2spk like this:
uttid1 uttid1
uttid2 uttid2
……
……
in any line above, the two fields have the same id name, is it true?
And then I can use :

utils/utt2spk_to_spk2utt.pl data/train/utt2spk > data/train/spk2utt
to create the inverted mapping.

我是指我应该utt2spk这样的:
uttid1 uttid1
uttid2 uttid2
......
......
在上面,这两个领域有相同的ID名字,是不是真的任何行?
然后我可以使用:

utils / utt2spk_to_spk2utt.pl data / train / utt2spk> data / train / spk2utt
创建反向映射。

Yes, that's right, they should be the same. Your current utt2spk probably looks something like是的,没错,它们应该相同。您当前的utt2spk可能看起来像

00_a012961_b012962_20041013_115300_a012961_b012962_b_0122_female
20041013_1153
but it should look like
00_a012961_b012962_20041013_115300_a012961_b012962_b_0122_female
00_a012961_b012962_20041013_115300_a012961_b012962_b_0122_female

Thank you for your clearly instructions.
The last question, I used the script:split_data.sh I found that it doesnot split the whole cmvn.scp into pieces
should I add this function to this script to ensure that it can split the whole cmvn.scp?
on the current stage, I donot make it.
after making the utt2spk and spk2utt by using the first field of wav.scp, it seemed that every thing is ok.
the displayed logs are like something this:

感谢您的明确指示。
最后一个问题,我使用了脚本:split_data.sh, 我发现它不会将整个cmvn.scp分成几部分, 是否应该将此函数添加到此脚本中以确保它可以将
整个cmvn.scp分开?
在目前阶段,我没有做到。
使用wav.scp的第一个字段制作utt2spk和spk2utt之后, 似乎一切正常。
显示的日志如下所示:

steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train
/home/gaoxinglong/speech_data/callhk/mfcc
Succeeded creating CMVN stats for train
steps/make_mfcc.sh --nj 10 data/test exp/make_mfcc/test
/home/gaoxinglong/speech_data/callhk/mfcc
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp
indexed by utterance.
Succeeded creating MFCC features for test
steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test
/home/gaoxinglong/speech_data/callhk/mfcc
Succeeded creating CMVN stats for test
fix_data_dir.sh: kept all 83944 utterances.
fix_data_dir.sh: old files are kept in data/train/.backup
steps/train_mono.sh --nj 10 data/train data/lang exp/mono0a
steps/train_mono.sh: Initializing monophone system.
gmm-init-mono
compile-train-graphs
steps/train_mono.sh: Compiling training graphs
align-equal-compiled and gmm-acc-stats-ali
steps/train_mono.sh: Aligning data equally (pass 0)
gmm-est
steps/train_mono.sh: Pass 1
steps/train_mono.sh: Aligning data

Maybe I have understood what is happened about this result, this log comes from align.0.?.log
and the log comes from align.1.?.log seems more better than this, only small amount of files are needed to retry to realized alignment.也许我已经了解了此结果的结果,
该日志来自align.0.?.log
,而日志来自align.1.?.log似乎比这更好,只需要少量文件即可重试实现对齐。

I have passed all steps for training GMM acoustic model with CallHK dataset.
I choose 3627 clauses as testset.
when completed the training with regard to mono, I did decode with an arpa file which size is 576 M(comes from CTS matrials),
but I got a wer by about 99% the results given by the standard script is about 80%. It is a significant difference.
Do you think the arpa file can cause this difference?
and do you know what language matrial can be used to create the arpa file for which is correspond to the wer by about 80%?
Thank you.我已通过CallHK数据集通过了训练GMM声学模型的所有步骤。 我选择3627子句作为测试集。
当完成有关单声道的培训时,我确实使用了一个大小为576 M(来自CTS矩阵)的arpa文件进行解码,
但是我得到了大约99%的错误,标准脚本给出的结果大约是80%。这是一个很大的差异。
您是否认为arpa文件可以引起这种差异?
并且您知道可以使用哪种语言材料创建对应于约80%的arpa文件?

I normally don't even bother decoding the monophone stage. I would advise to do a couple of iterations of triphone model training and test then.我通常甚至不用费心解码单声道阶段。我建议随后做一些三音手机模型训练和测试的迭代。

Thank you Dan, for the reason that I want to test the performance of training of monophone system is I want to check out whether my trainning precedure or preparation is correct or not.
I have grasped your ideas and found out the language model from scripts and your personal website. I know you have compleleted a rather perfect and scientific pilot system.
And my language model is relative comlicated than yours. Hence, they are can not be compared on a same level.因为我要测试
单声道系统的训练性能的原因是我想检查我的训练程序或准备工作是否正确。
我已经掌握了您的想法,并从脚本和您的个人网站中找到了语言模型。我知道您已经完成了一个相当完善和科学的试验系统。
我的语言模型比您的相对复杂。因此,它们不能在同一水平上进行比较。

友情链接
版权所有 Copyright(c)2004-2021 锐英源软件
公司注册号:410105000449586 豫ICP备08007559号 最佳分辨率 1024*768
地址:郑州大学北校区院(文化路97号院)内