锐英源软件
第一信赖

精通

英语

开源

擅长

开发

培训

胸怀四海 

第一信赖

当前位置:锐英源 / 开源技术 / 语音识别开源 / 理解numLeaves和numGaussians
服务方向
人工智能数据处理
人工智能培训
kaldi数据准备
小语种语音识别
语音识别标注
语音识别系统
语音识别转文字
kaldi开发技术服务
软件开发
运动控制卡上位机
机械加工软件
软件开发培训
Java 安卓移动开发
VC++
C#软件
汇编和破解
驱动开发
联系方式
固话:0371-63888850
手机:138-0381-0136
Q Q:396806883
微信:ryysoft

锐英源精品开源,禁止转载和任何形式的非法内容使用,违者必究

 

理解numLeaves和numGaussians


背景

本文介绍语音识别决策树相关的叶子数和高斯数概念,并对比cmusphinx和kalid处理上的差异。

正文

a)I observed very large numbers for variables numLeaves=2500 and numGauss=15000 in timit/s5/run.sh. Do they respectively correspond to numMixtureGaussians and numTiedStates (senones)? If yes, then in Sphinx we used to have very low values for a database of size of TIMIT, i.e. numTiedStates<1000 and numMixtureGaussians between 8 to 32.

a)我在timit / s5 / run.sh中观察到非常大的变量numLeaves = 2500和numGauss = 15000。它们分别对应于numMixtureGaussians和numTiedStates(senones)吗?如果是,那么在Sphinx中,我们过去对于TIMIT大小的数据库通常使用非常低的值,即numTiedStates<1000和numMixtureGaussians在8至32之间。

b)Also, I observed that the number of training iterations are fixed here in advance, unlike Sphinx where the training stops when the Convergence ratio of likelihoods doesn't increase beyond a preset threhold.b)另外,我观察到训练迭代的次数在此处是预先确定的,这与Sphinx不同,在Sphinx中,当似然收敛比率未增加到预设阈值以上时,训练将停止。

Please explain the differences in point a) and the logic behind hardcoding #iterations in point b)请说明点a)的差异以及点b)中对#迭代进行硬编码的逻辑

numGauss is the total Gaussians over all leaves (the #gauss per leaf is not fixed but varies according to data-count^0.2).Possibly the numLeaves and numGauss were not tuned well for TIMIT. numGauss是所有叶子上的总高斯(每片叶子的#gauss不是固定的,但会根据data-count ^ 0.2的变化而变化)。numLeaves和numGauss可能不适用于TIMIT。你

Youcould try fewer and see if it helps. But I don't recommend to use TIMIT; I prefer RM for small-scale debugging, or WSJ for larger scale.可以尝试减少尝试,看看是否有帮助。但是我不建议使用TIMIT。对于小型调试,我更喜欢使用RM;对于大规模调试,我更喜欢使用WSJ。

1) Does #pdfs obtained from gmm-info mean numLeaves? If yes, then why #pdfs trained = 1722, when I had set numLeaves to 2500 ? 1)从gmm-info获得的#pdf表示numLeaves吗?如果是,那么当我将numLeaves设置为2500时,为什么训练了#pdfs = 1722?

After doing the tree splitting, it clusters the leaves, so the number gets a little smaller.进行树分割后,它将叶子聚类,因此数量变小了。

2) Does numGauss/leaf vary for different leaves? 3)http://www.cs.toronto.edu/~fritz/absps/icassp12_dbn.pdf suggests that correlated feats such as log-filterbank work better than MFCCs? Then, why do we try to decorrelate the computed MFCCs using LDA+MLLT transform? as given at http://kaldi.sourceforge.net/dnn2.html2)numGauss / leaf是否随不同的叶子而变化? 3)http://www.cs.toronto.edu/~fritz/absps/icassp12_dbn.pdf建议相关的功能(例如对数过滤器库)比MFCC更好?然后,为什么我们尝试使用LDA + MLLT变换对相关的MFCC进行解相关?如http://kaldi.sourceforge.net/dnn2.html所述

That's an interaction with pre-training of DNNs. The dnn2 recipe does not use pre-training so it's not an issue.这是与DNN的预训练的交互。 dnn2方案不使用预训练,因此这不是问题。

(a)In Sphinx-3, if we set numLeaves (tied-states) to 2500, it would train exactly 2500 leaves. But, in Kaldi it is training 1722.(b)Same argument holds true with the numGauss for a leaf which is not exactly same for all leaves in Kaldi unlike Sphinx-3 So, do the above two attributes grossly convey that Kaldi does not rigidly follow these two user set params, and decides the best fit according to the data ?(a)在Sphinx-3中,如果将numLeaves(并列状态)设置为2500,则它将精确地训练2500片叶子。但是,在Kaldi中训练1722.(b)对于numGauss来说,相同的论点适用于一片叶子,与Sphinx-3不同,该叶子对于Kaldi中的所有叶子并不完全相同,因此,上述两个属性是否可以大致传达出Kaldi并不严格遵循这两个用户设置参数,并根据数据确定最佳拟合?

The number of leaves is slightly less than what you set because it clusters the leaves after splitting the tree to the specified size.The numGauss specified as a total across all states. The num-gauss for each state (pdf-id) is allocated according to a small power (0.2) of the count.叶子的数量略少于您设置的数量,因为它在将树拆分为指定的大小之后将叶子聚类。numGauss指定为所有状态的总数。每个状态的num-gauss(pdf-id)根据计数的小数幂(0.2)分配。

I see. So, as I get from the reply, the difference lies in the point that, in Kaldi, the tree is ""split until the specified size"" whereas in Sphinx-3 the tree is grown fully, and then it is ""pruned to leave as many leaves as specified"".The Sphinx info was taken from the link "Pruning Decision Trees" at http://www.speech.cs.cmu.edu/sphinxman/fr4.html我懂了。因此,正如我从答复中得到的,不同之处在于,在Kaldi中,树是“拆分到指定的大小”,而在Sphinx-3中,树已完全长大,然后被修剪了。留下指定数量的叶子“”。Sphinx信息来自ttp://www.speech.cs.cmu.edu/sphinxman/fr4.html上的“修剪决策树”链接。

友情链接
版权所有 Copyright(c)2004-2021 锐英源软件
公司注册号:410105000449586 豫ICP备08007559号 最佳分辨率 1024*768
地址:郑州大学北校区院(文化路97号院)内