锐英源软件
第一信赖

精通

英语

开源

擅长

开发

培训

胸怀四海 

第一信赖

当前位置:锐英源 / 开源技术 / 语音识别开源 / TransitionModel
服务方向
人工智能数据处理
人工智能培训
kaldi数据准备
小语种语音识别
语音识别标注
语音识别系统
语音识别转文字
kaldi开发技术服务
软件开发
运动控制卡上位机
机械加工软件
软件开发培训
Java 安卓移动开发
VC++
C#软件
汇编和破解
驱动开发
联系方式
固话:0371-63888850
手机:138-0381-0136
Q Q:396806883
微信:ryysoft

TransitionModel


背景

最近结合kaldi的例子改出来一个语音识别在线服务器,编译通过,对里面的代码进行深入理解下,看到了TransitionModel,它用mdl初始化,是SingleUtteranceNnet2Decoder和OnlineSilenceWeighting这2个类对象构造函数中的参数,这表明 TransitionModel是语音识别的关键输入。下面是一些理论描述,感谢网络上前辈。

理论

kaldi中的HMM模型,实际就是一个TransitionModel对象。这个对象描述了音素的HMM拓扑结构,并保存了pdf-id和transition-id相关的信息,并且可以进行各种变量的转换。
TransitionModel的定义和实现位于transition-model.h和transition-model.cc中。在了解此对象之前,应先阅读和理解hmm-topology相关的内容。
在介绍TransitionModel之前,先介绍一些概念。
                phone:  音素,从1开始编号。可以根据phones.txt映射为具体音素
        HMM-state:  音素HMM模型的状态,从0开始编号
                 pdf-id:  决策树和声学模型中用到的pdf的编号,从0开始
   transition-state:  一个(虚拟的)状态,通过弧跳转到自己或其他状态。某些情况下,可以跟pdf-id一一对应。
  transition-index:  HMM状态中转移的索引,即HmmTopology::HmmState::transitions的索引,从0开始编号
        transition-id:  所有的HMM状态的弧进行编号。从1开始编号。
   
   通常,将phone、HMM-state和pdf-id(包括forward-pdf-id, self-loop-pdf-id)作为一个元组(Tuple),一个元组,可映射为一个transition-state。transition-state加一个具体的transition-index,可以映射出一个transition-id。各种映射关系如下:
   (phone, HMM-state, forward-pdf-id, self-loop-pdf-id) -> transition-state
   (transition-state, transition-index)                 -> transition-id
  
同时也存在着反向的映射关系,即:
                      transition-id -> transition-state
                      transition-id -> transition-index
                   transition-state -> phone
                   transition-state -> HMM-state
                   transition-state -> forward-pdf-id
                   transition-state -> self-loop-pdf-id
kaldi中TransitionModel的定义如下,为了方便阅读和理解,对代码做了修改。

  1. class TransitionModel {
  2.  
  3.  public:
  4.  
  5.   TransitionModel() { }
  6.  
  7.   void Read(std::istream &is, bool binary);
  8.   void Write(std::ostream &os, bool binary) const;
  9.  
  10.   /// return reference to HMM-topology object.
  11.   const HmmTopology &GetTopo() const { return topo_; }
  12.  
  13.   /// \name Integer mapping functions
  14.   /// @{
  15.  
  16.   int32 TupleToTransitionState(int32 phone, int32 hmm_state, int32 pdf, int32 self_loop_pdf) const;
  17.   int32 PairToTransitionId(int32 trans_state, int32 trans_index) const;
  18.   int32 TransitionIdToTransitionState(int32 trans_id) const;  //return id2state_[trans_id];
  19.   int32 TransitionIdToTransitionIndex(int32 trans_id) const;
  20.   int32 TransitionStateToPhone(int32 trans_state) const;  //return tuples_[trans_state-1].phone;
  21.   int32 TransitionStateToHmmState(int32 trans_state) const;
  22.   int32 TransitionStateToForwardPdfClass(int32 trans_state) const;
  23.   int32 TransitionStateToSelfLoopPdfClass(int32 trans_state) const;
  24.   int32 TransitionStateToForwardPdf(int32 trans_state) const;
  25.   int32 TransitionStateToSelfLoopPdf(int32 trans_state) const;
  26.   int32 SelfLoopOf(int32 trans_state) const;  // returns the self-loop transition-id, or zero if
  27.   // this state doesn't have a self-loop.
  28.  
  29.   inline int32 TransitionIdToPdf(int32 trans_id) const;  //return id2pdf_id_[trans_id];
  30.   int32 TransitionIdToPhone(int32 trans_id) const;  //return tuples_[id2state_[trans_id]-1].phone;
  31.   int32 TransitionIdToPdfClass(int32 trans_id) const;
  32.   int32 TransitionIdToHmmState(int32 trans_id) const;
  33.  
  34.  
  35.  
  36.   /// Returns the total number of transition-ids (note, these are one-based).
  37.   inline int32 NumTransitionIds() const { return id2state_.size()-1; }
  38.  
  39.   /// Returns the number of transition-indices for a particular transition-state.
  40.   /// Note: "Indices" is the plural of "index".   Index is not the same as "id",
  41.   /// here.  A transition-index is a zero-based offset into the transitions
  42.   /// out of a particular transition state.
  43.   int32 NumTransitionIndices(int32 trans_state){
  44.                 return state2id_[trans_state+1]-state2id_[trans_state];
  45.   }
  46.  
  47.   /// Returns the total number of transition-states (note, these are one-based).
  48.   int32 NumTransitionStates() const { return tuples_.size(); }
  49.  
  50.   // NumPdfs() actually returns the highest-numbered pdf we ever saw, plus one.
  51.   // In normal cases this should equal the number of pdfs in the system, but if you
  52.   // initialized this object with fewer than all the phones, and it happens that
  53.   // an unseen phone has the highest-numbered pdf, this might be different.
  54.   int32 NumPdfs() const { return num_pdfs_; }
  55.  
  56.   BaseFloat GetTransitionLogProb(int32 trans_id){
  57.                 return log_probs_(trans_id);
  58.   }
  59.  
  60.  
  61.  private:
  62.  
  63.   struct Tuple {
  64.     int32 phone;
  65.     int32 hmm_state;
  66.     int32 forward_pdf;
  67.     int32 self_loop_pdf;
  68.     Tuple() { }
  69.     Tuple(int32 phone, int32 hmm_state, int32 forward_pdf, int32 self_loop_pdf):
  70.       phone(phone), hmm_state(hmm_state), forward_pdf(forward_pdf), self_loop_pdf(self_loop_pdf) { }
  71.   };
  72.  
  73.   HmmTopology topo_;
  74.  
  75.   /// Triples indexed by transition state minus one;
  76.   /// the triples are in sorted order which allows us to do the reverse mapping from
  77.   /// triple to transition state
  78.   std::vector<Tuple> tuples_;
  79.  
  80.   /// Gives the first transition_id of each transition-state; indexed by
  81.   /// the transition-state.  Array indexed 1..num-transition-states+1 (the last one
  82.   /// is needed so we can know the num-transitions of the last transition-state.
  83.   std::vector<int32> state2id_;
  84.  
  85.   /// For each transition-id, the corresponding transition
  86.   /// state (indexed by transition-id).
  87.   std::vector<int32> id2state_;
  88.  
  89.   std::vector<int32> id2pdf_id_;
  90.  
  91.   /// For each transition-id, the corresponding log-prob.  Indexed by transition-id.
  92.   Vector<BaseFloat> log_probs_;
  93.  
  94.   /// For each transition-state, the log of (1 - self-loop-prob).  Indexed by
  95.   /// transition-state.
  96.   Vector<BaseFloat> non_self_loop_log_probs_;
  97.  
  98.   /// This is actually one plus the highest-numbered pdf we ever got back from the
  99.   /// tree (but the tree numbers pdfs contiguously from zero so this is the number
  100.   /// of pdfs).
  101.   int32 num_pdfs_;
  102. };

实际写入模型文件(如final.mdl)中的HMM模型就是一个TransitionModel对象。但是写入到文件中的,并不是所有成员变量。只是写入了topo_、tuples_和log_probs_这三项。其他项,都是在后来计算出来的。下面的表格,也对几个成员变量,做了汇总介绍。

TransitionModel成员
表格中,“tr_state”表示transition-state。

相关工具

查看模型transition信息

show-transitions phones.txt final.mdl

可视化tree,dot命令,详细查询graphviz

# -Gsize指定大小,-T指定保存类型,可以是png, jpg, pdf等
draw-tree phones.txt tree | dot -Gsize=80,100 -Tpng > tree.png
draw-tree phones.txt tree | dot -Tpdf > tree.pdf

GMM模型转文本格式

# 输出到标准输出
gmm-copy --binary=false final.mdl -

Tree转文本格式

# 输出到标准输出
eg: copy-tree --binary=false tree -

调试跟踪

有些特殊情况可以带if语句的日志输出来判断,毕竟工具看的结果是已经运行后的结果,运行状态看不到。

友情链接
版权所有 Copyright(c)2004-2021 锐英源软件
公司注册号:410105000449586 豫ICP备08007559号 最佳分辨率 1024*768
地址:郑州大学北校区院(文化路97号院)内