锐英源软件
第一信赖

精通

英语

开源

擅长

开发

培训

胸怀四海 

第一信赖

当前位置:锐英源 / 开源技术 / 语音识别开源 / Voxforge模型在线识别缓冲处理

服务方向

人工智能数据处理
人工智能培训
kaldi数据准备
小语种语音识别
语音识别标注
语音识别系统
语音识别转文字
kaldi开发技术服务
软件开发
运动控制卡上位机
机械加工软件
软件开发培训
Java 安卓移动开发
VC++
C#软件
汇编和破解
驱动开发

联系方式

固话:0371-63888850
手机:138-0381-0136
Q Q:396806883
微信:ryysoft

Voxforge模型在线识别缓冲处理


Hi , i'm trying to make a real time decoder (based on online2 setup) with models from Voxforge recipe , but at runtime all returned results (strings) are empty.
Currently i'm using models obtained with Voxforge recipe at "MMI on top of LDA+MLLT" step.
I need some other steps to use these models in online decoding?我正在尝试使用Voxforge配方的模型制作实时解码器(基于online2设置),但在运行时所有返回的结果(字符串)均为空。
目前,我正在使用通过Voxforge配方在“ LDA + MLLT之上的MMI”步骤获得的模型。
我需要其他一些步骤才能在在线解码中使用这些模型吗?

 

I didn't get this message by email, there seems to be a problem with Sourceforge right now w.r.t. the forums.
Anyway, that should work- perhaps if you show the command line you used and what the output was, the problem will be obvious.
If your signal's sampling rate is different from what the configs expect, you cannot just change the sample rate in the config-- you have to subsample your signal.无论如何,这应该可行-也许如果您显示所使用的命令行以及输出的内容,问题将很明显。
如果信号的采样率与配置的期望值不同,则不能仅更改配置中的采样率,而必须对信号进行二次采样。

 

Problem solved, subsampling the signal.
But i have another question, there's a way to get bit depth from the acoustic model?问题解决了,对信号进行了二次采样。
但是我还有另一个问题,有没有办法从声学模型中获得位深度?

 


I guess you want to normalize the amplitude.
Normalizing the amplitude of the input is important- at least to have it in the right range You would have to do this from the training wav data. I don't think the existing Kaldi tools will do this from the command line, you could maybe use sox or some other tool. Vijay, how did you do this?我想您想对幅度进行归一化。
归一化输入的幅度很重要-至少要使其处于正确的范围内,您必须根据训练wav数据进行此操作。我认为现有的Kaldi工具不会从命令行执行此操作,您可以
使用sox或其他工具。维杰,你是怎么做到的?

I'm setting the amplitude of the input source (i.e.16-bit pcm) on the same as the acoustic model(i.e.16 bit)
The problem that i'm encountering right now, is the conversion from int16 buffer of samples (directly from microphone), to Vector<BaseFloat>. I make the conversion sample by sample, but the resulting vector is dirty , it has peaks that in the original signal are not present.我正在将输入源的振幅(即16位pcm)设置为与声学模型(即16位)相同 。我现在遇到的问题是从样本的int16缓冲区转换(直接从麦克风) ,到Vector <BaseFloat>。我逐个样本制作了转换样本,但所得的向量很脏,它的峰值在原始信号中不存在。

 

I'm not sure if I understand what the problem is, but perhaps the problem is because of using unsigned 16bit values where signed 16bit values should be used? Or vice versa?我不确定是否理解问题所在,但也许是由于使用无符号16位值而应使用有符号16位值引起的呢?或相反亦然?

The model and the source audio have the same parameters (samp_freq 16khz, 16bit signed), but copying the samples from the buffer (short in this case) to Vector<BaseFloat> ,which will be given to the feature pipeline, the samples (in the Vector) are distorted despite the cast.
(i have an image of the short signal and converted signal , if can help i will post it)模型和源音频具有相同的参数(samp_freq 16khz,16位带符号),但是将样本从缓冲区(在这种情况下为short)复制到Vector <BaseFloat>,它将被提供给特征管道,即样本(在向量)尽管进行了强制转换,但仍然失真。
(我有短信号和转换后的信号的图像,如果可以帮助我将其发布)

Hm. I'd say this is an audio capture issue rather than a Kaldi issue per se. Make sure the vector has the correct dimension.
You could maybe focus on the samples that seem wrong, and print them out at various points in your process.嗯 我会说这本身就是音频捕获问题,而不是Kaldi问题 。确保矢量具有正确的尺寸。 您可能会专注于看似错误的样本,然后在过程中的各个时间点将它们打印出来。

Problem solved. It was a problem with audio capture, solved switching to portaudio.问题解决了。这是音频捕获的问题,解决了切换到portaudio的问题。

友情链接
版权所有 Copyright(c)2004-2021 锐英源软件
公司注册号:410105000449586 豫ICP备08007559号 最佳分辨率 1024*768
地址:郑州大学北校区院(文化路97号院)内