锐英源软件
第一信赖

精通

英语

开源

擅长

开发

培训

胸怀四海 

第一信赖

当前位置:锐英源 / 开源技术 / 语音识别开源 / lattice-to-phone-lattice里是否有隐藏对齐?
服务方向
人工智能数据处理
人工智能培训
kaldi数据准备
小语种语音识别
语音识别标注
语音识别系统
语音识别转文字
kaldi开发技术服务
软件开发
运动控制卡上位机
机械加工软件
软件开发培训
Java 安卓移动开发
VC++
C#软件
汇编和破解
驱动开发
联系方式
固话:0371-63888850
手机:138-0381-0136
Q Q:396806883
微信:ryysoft

lattice-to-phone-lattice里是否有隐藏对齐?


I was trying to get time information from lattice by using lattice-to-ctm-conf. I tried two methods:

1 Apply lattice-to-ctm-conf to word lattice directly
2 Use lattice-to-phone-lattice to convert word lattice to phone lattice first, then apply lattice-to-ctm-conf to get phone level ctm

I found that the alignment is very different from these two methods. Basically alignments from the 2nd method has more accurate time information.

So I wonder if there is a hidden alignment operation in lattice-to-phone-lattice?我试图通过使用lattice-to-ctm-conf从晶格获取时间信息。我尝试了两种方法:

1将lattice-to-ctm-conf直接应用于单词晶格
2首先使用lattice-to-phone-lattice将单词晶格转换为音素点阵,然后再将lattice-to-ctm-conf应用于音素级ctm

我发现对齐方式与这两种方法有很大不同。基本上,第二种方法的比对具有更准确的时间信息。

因此,我想知道晶格到音素晶格中是否存在隐藏的对齐操作?

The word-alignment information in the lattice is not exact unless you do lattice-align-words. If you look at the scripts get_ctm.sh and
get_train_ctm.sh you will see how to do it. The phone-alignment information is always exact.进行晶格对齐字,否则晶格中的字对齐信息并不准确。如果查看脚本get_ctm.sh和get_train_ctm.sh,您将了解如何执行此操作。音素对准信息始终是准确的。

Yeah actually I should use lattice-align-words-lexicon. I couldn't get it work before because I did not convert the line-break in my lexicon to unix style (I was running kaldi in Windows).

Now it works well :)

Just one follow up question, so there is a alignment procedure in lattice-to-phone-lattice right? Cuz you said that "phone-alignment" information is always exact.

是的,实际上我应该使用lattice-align-words-lexicon。我之前无法使用它,因为我没有将词典中的换行符转换为Unix风格(我在Windows中运行kaldi)。

现在它运作良好:)

只是一个跟进问题,所以在晶格到音素晶格中有一个对齐程序,对吗?因为您说“音素对准”信息总是准确的。


Yeah actually I should use lattice-align-words-lexicon. I couldn't get it work before because I did not convert the line-break in my lexicon to unix style (I was running kaldi in Windows).

I am changing the code to be more tolerant of the \r characters, will check in later.

Just one follow up question, so there is a alignment procedure in lattice-to-phone-lattice right? Cuz you said that "phone-alignment"
information is always exact.

I don't really call it alignment because what goes on there is very trivial. But yes, in a sense.

是的,实际上我应该使用lattice-align-words-lexicon。我之前无法使用它,因为我没有将词典中的换行符转换为Unix风格(我在Windows中运行kaldi)。

我正在更改代码,以更加容忍\r字符, 稍后将进行检入。

只是一个跟进问题,所以在晶格到音素晶格中有一个对齐程序,对吗?因为您说“音素对准”
信息总是准确的。

我真的不称其为对齐,因为那里发生的事情非常琐碎。但是从某种意义上说是的。

 

 just be curious, but why does the kaldi-latgen generates inexact time information on words? I mean you must have the exact time information to get the right decoding results. Then why output inaccurate transition id sequence directly.很好奇,但是为什么kaldi-latgen会在单词上生成不精确的时间信息?我的意思是您必须具有准确的时间信息才能获得正确的解码结果。那么为什么直接输出不正确的过渡ID序列。

The time information is not exact due to WFST graph compression. When Kaldi compiles the search graph openfst moves word label back and forth in order to make it more compact. This way Kaldi reduces the graph size and speedup the decoding but the timing of the words is not very accurate since output word label can be placed in the middle of the word itself. Later after decoding is complete Kaldi needs to place word labels to be in sync with the phone labels in order to get proper word timing.由于WFST图形压缩,时间信息不准确。Kaldi编译搜索图时,openfst来回移动单词标签以使其更紧凑。这样,Kaldi减小了图形大小并加快了解码速度,但是单词的时序不太准确,因为可以将输出的单词标签放在单词本身的中间。解码完成后,Kaldi需要放置单词标签以使其与音素标签同步,以便获得正确的单词计时。

Forgive my stupid question, but based on your words, is it possible to get exact time information if the WFST graph is not compressed? Or can we set a proper set of compression parameters to avoid this problem?

Forgive my stupid question, but based on your words, is it possible to get exact time information if the WFST graph is not compressed?

Yes.

Or can we set a proper set of compression parameters to avoid this problem?

You remove all fstminimizeencoded calls in graph construction. The graph will be bigger then and decoding will be slower though.

原谅我的愚蠢问题,但是根据您的话语,如果不压缩WFST图,是否可以获得准确的时间信息?

是。

还是可以设置一组适当的压缩参数来避免此问题?

您删除图形构造中的所有fstminimizeencoded调用。然后,该图将更大,而解码将更慢。

 

I'm not sure that this will work because lattice-determinization takes place in all the lattice creation code, and that will push around the
transition-ids relative to the word symbols.
However, you can always do lattice-align-words or lattice-align-words-lexicon to recover the alignment information.我不确定这是否会奏效,因为
在所有晶格创建代码中都会发生晶格确定化,而晶格确定化会相对于单词符号绕过转换ID。 但是,您始终可以执行“晶格对齐字”或“ 晶格对齐字词”词典来恢复对齐信息。

友情链接
版权所有 Copyright(c)2004-2021 锐英源软件
公司注册号:410105000449586 豫ICP备08007559号 最佳分辨率 1024*768
地址:郑州大学北校区院(文化路97号院)内