当前位置：锐英源 / 开源技术 / 语音识别开源 / kaldi的DNN模型初始化疑问

服务方向

人工智能数据处理
人工智能培训
kaldi数据准备
小语种语音识别
语音识别标注
语音识别系统
语音识别转文字
kaldi开发技术服务
软件开发: 运动控制卡上位机; 机械加工软件
软件开发培训: Java 安卓移动开发; VC++; C#软件; 汇编和破解; 驱动开发

联系方式

固话：0371-63888850
手机：138-0381-0136
Q Q：396806883
微信：ryysoft

kaldi的DNN模型初始化疑问

问

I am a new user of Kaldi and have little background knowledge of the toolkit. Right now I am trying some benchmark setup to compare Kaldi and our own DNN training toolkit. So I am thinking about some quick plan to do below for the comparison,
1) convert our data alignment files to Kaldi format
2) do DNN training with Kaldi
3) convert Kaldi-trained DNN model back to our own DNN format for testing

However, when I took a look at DNN training scripts (for Dan's implementation) in swbd 'steps/nnet2/train_pnorm_accel2.sh', I noticed the below initialization of shallow network seems not done from alignment, and tree is needed.

nnet-am-init $alidir/tree $lang/topo "nnet-init --srand=$srand $dir/nnet.config -|" $dir/0.mdl

I am wondering, for Dan's DNN implementation, how DNN model is initialized from algorithm side in Kaldi and why tree is needed. The Kaldi homepage "Dan's DNN implementation" section does not have enough information regarding this.

我是Kaldi的新用户，对工具包的了解很少。现在，我正在尝试一些基准设置，以比较Kaldi和我们自己的DNN培训工具包。因此，我正在考虑以下一些快速计划以进行比较：
1）将我们的数据对齐文件转换为Kaldi格式
2）使用Kaldi进行DNN训练
3）将Kaldi训练的DNN模型转换回我们自己的DNN格式进行测试

但是，当我在swbd'steps / nnet2 / train_pnorm_accel2.sh'中查看DNN训练脚本（用于Dan的实现）时，我注意到下面的浅层网络初始化似乎不是通过对齐完成的，并且需要树。

nnet-am-init $ alidir / tree $ lang / topo“ nnet-init --srand = $ srand $ dir / nnet.config-|” $ dir / 0.mdl

我想知道，对于Dan的DNN实现，如何从Kaldi的算法端初始化DNN模型，以及为什么需要树。Kaldi主页的“ Dan的DNN实现”部分没有足够的信息。

答

It needs that stuff because the .nnet files contain the transition-model as well as the actual neural net. In most situations
the transition model is not used though. Getting rid of this might require writing new binaries.
Also the nnet2 setup uses nonlinearity types that probably do not exist in your setup (p-norm, normalize-layer, splicing layers). If it
is a speech task it would probably be much less work to just train a Kaldi acoustic model, and the performance will probably be better
also.它需要这些东西，因为.nnet文件包含转换模型以及实际的神经网络。在大多数情况下，虽然不使用过渡模型。摆脱这种情况可能
需要编写新的二进制文件。
同样，nnet2设置使用的非线性类型可能不存在于配置（p范数，归一化层，拼接层）。如果是语音任务，仅训练Kaldi声学模型的工作量可能会少得多，并且性能也可能会更好

答

what is 'splicing layers'? it seems not mentioned in the documentation, maybe my misunderstanding.什么是“拼接层”？它似乎没有在文档中提及，也许是我的误会。

答

SpliceComponent. For this type of thing you will have to search the code, not the documentation; the documentation is only very
high-level.SpliceComponent。对于这种类型的东西，您将必须搜索代码，而不是文档。该文档只是非常抽象的。

答

does your implementation only support the configuration you mentioned (p-norm, norm layer, slicing layer), or also support pretty standard DNN configuration?

您的实现仅支持您提到的配置（p-norm，norm层，切片层），还是还支持标准DNN配置？

答

It does support more standard configurations but the performance of those is not always quite as good, and it hasn't been tuned as
recently. Actually ReLUs sometimes give better performance than p-norm, but we always train them with the normalization layer to
ensure stability during training, and you can't test without that layer being included. So without adding that to your toolkit you
wouldn't be able to do the comparison. Anyway that would probably be the least of your problems.它确实支持更多标准配置，但是
这些配置的性能并不总是那么好，并且最近还没有对其进行调整。实际上，ReLU有时会比p-norm提供更好的性能，但是我们总是使用标准化层对其
进行训练，以确保训练期间的稳定性，并且如果不包括该层，则无法进行测试。因此，如果不将其添加到您的工具箱中，您将无法进行比较。无论如何，这可能是最少的问题。

答

So in case of relu, you are saying, without normalziation, the training is not stable. We have been training relu net without any normalization and did not see the stability issue. we tried mean-normalized SGD as well and did not turned out to help. So does the stability issue without normalization layer has something to do with your parallelization and optimization methods (parameter averaging and natural gradient)?

I am OK with the relu net with normalization layer, and decode with our decoder. In decoding, the normalization layer should be treated as standard layer without extra support needed from our decoder side. Is there any existing recipe with relu net? It is ok if it is not well tuned.

regarding slicing layers, I know this is to handle the left and right feature context. But by looking at the nnet.config this still looks confusing to me. We are using pretty standard approach to directly feed feature with context to input layer. At this point I am trying to quickly get some idea without the need to look into source codes (I am pretty new to Kaldi), I want to see whether extra support is needed from our decoder for slicing layers. Overall my first goal is to set up plan quickly. Moving forward, I will for sure need to look into more code details.

By the way, do you have a sample run with all output dirs for either your wsj or switchboard DNN receipt somewhere that I can access?

因此，在说relu的情况下，如果没有标准化，培训就不稳定。我们一直在训练relu net，没有进行任何标准化，也没有看到稳定性问题。我们也尝试了均值标准化的SGD，但没有帮助。那么没有归一化层的稳定性问题是否与您的并行化和优化方法（参数平均和自然梯度）有关？

我可以使用具有标准化层的relu net，并使用我们的解码器进行解码。在解码中，标准化层应被视为标准层，而无需解码器方面的额外支持。是否有带relu net的现有方案？如果调整不佳也可以。

关于切片层，我知道这是要处理左右要素上下文。但是通过查看nnet.config，这仍然让我感到困惑。我们正在使用非常标准的方法将具有上下文的要素直接提供给输入层。在这一点上，我试图快速了解一些信息，而无需研究源代码（我对Kaldi来说还很陌生），我想看看是否需要解码器为切片层提供额外的支持。总的来说，我的首要目标是快速制定计划。继续前进，我肯定会需要研究更多代码细节。

顺便说一句，您是否有一个样本，其中包含我可以访问的wsj或switchboard DNN方案的所有输出目录？

答

Not really. The natural gradient actually improves the stability.
People who train ReLUs with many layers usually have to resort to some kind of trick to stabilize it, this happens to be the trick we have chosen.

并不是的。自然梯度实际上提高了稳定性。多层训练ReLU的人通常必须求助于某种技巧来稳定它，这恰好是我们选择的技巧。

In the nnet2 code the splicing is done internally to the network but you could just discard the SpliceComponent and do it from externally.
However the current ReLU recipes that we are using (e.g. steps/nnet2/train_multisplice_accel2.sh if you set --pnorm-input-dim
and --pnorm-output-dim to the same value) actually also do splicing at intermediate layers so your framework wouldn't be able to handle it.
We don't have any ReLU recipes currently, that don't do that.

在nnet2代码中，拼接是在网络内部完成的，但是您可以只丢弃SpliceComponent并从外部进行。
但是，我们目前使用的当前ReLU方案（例如，如果将--pnorm-input-dim和--pnorm-output-dim设置为相同的值，则例如step / nnet2 / train_multisplice_accel2.sh ）实际上也会在中间层进行拼接，因此您框架将无法处理它。我们目前没有任何ReLU方案，也没有这样做。

答

You can look on kaldi-asr.org and see if there is something.

You obviously have a lot of questions, because you've chosen to use Kaldi in a way that is inherently quite difficult. I'm a busy person
and I'm not going to be able to hold your hand and take you through all the things you need to do.

您可以查看kaldi-asr.org，看看是否有东西。

显然，您有很多问题，因为您选择了固有地非常困难的方式来使用Kaldi。我是一个忙碌的人，我无法握住您的手，带您完成所有需要做的事情。

答

I understand, I will not expect that much help when I really start playing with the tool. Currently all the general questions are to estimate the effort we need for the work.

<< However the current ReLU recipes that we are using (e.g.steps/nnet2
<< /train_multisplice_accel2.sh if you set --pnorm-input-dim and --pnorm-output-dim to the << same value) actually also do splicing at
<< intermediate layers so your framework wouldn't be able to handle it.
<< We don't have any ReLU recipes currently, that don't do that.

Just want to confirm, I expect the modification of relu multi-splice receipt (so internal splicing of intermediate layer not done) to be just at shell script level with configuration changes, is it right? or c++ code level as well?

我了解，当我真正开始使用该工具时，不会期望有太多帮助。当前，所有一般性问题都是为了估计我们需要的工作量。

<<但是，我们正在使用的当前ReLU方案（例如，
如果将--pnorm-input-dim和--pnorm-output-dim设置为<<相同的值，例如egsteps / nnet2 << /train_multisplice_accel2.sh）实际上也会进行拼接在
<<中间层，因此您的框架将无法处理它。
<<我们目前没有任何ReLU方案，没有这样做。

我只想确认一下，我希望对relu多重接收的修改（这样就不会完成中间层的内部接合）只是在shell脚本级别进行配置更改，对吗？还是C ++代码级别？

答

Yes, the changes are at the command line level, you would just remove all the splicing specifications that say layer1/xxx and layer2/xxx and
so on, leaving only the layer0 one.是的，更改是在命令行级别进行的，您只需删除所有的拼接规范，例如layer1 / xxx和layer2 / xxx，依此类推，仅保留layer0即可。

答

regarding setting relu activation component instead of pnorm, from what you mentioned earlier in this thread (--pnorm-input-dim and --pnorm-output-dim to the same value), I can add below in nnet.config

PnormComponent input-dim=$pnorm_input_dim output-dim=$pnorm_input_dim p=?

I believe what p value is does not really matter in this case, or I do not need to specify p=? in above line?

in the meanwhile I am wondering how such pnorm (same input and output dim) setting will ends up with same as relu activation given y = max(0,x) for relu while y = (|x|^p)^(1/p) for such pnorm?关于设置relu激活组件而不是pnorm，根据您在本线程前面提到的内容（--pnorm-input-dim和--pnorm-output-dim为相同值），我可以在nnet.config中添加以下内容

PnormComponent input-dim = $ pnorm_input_dim output-dim = $ pnorm_input_dim p =？

我认为在这种情况下，p值实际上并不重要，或者我不需要指定在上面行p =？

同时，我想知道这样的pnorm（相同的输入和输出维）设置将如何以与relu激活相同的方式结束，给定y =（| x | ^ p）^（1 /的y = max（0，x） p）这样的nor？

No, what I was talking about related to the TDNN scripts, which use the RectfiedLinearComponent in that case. You have to use the
RectifiedLinearComponent if you want ReLU.不，我所说的与TDNN脚本有关，在这种情况下使用RectfiedLinearComponent。如果要使用
ReLU，则必须使用RectifiedLinearComponent。

友情链接

汕头招聘网 | 山东招聘网 | 郑州教育培训 | 软件下载