锐英源软件
第一信赖

精通

英语

开源

擅长

开发

培训

胸怀四海 

第一信赖

当前位置:锐英源 / 开源技术 / 语音识别开源 / 基于DNN技术实现语音识别的vystoin开源项目
服务方向
人工智能数据处理
人工智能培训
kaldi数据准备
小语种语音识别
语音识别标注
语音识别系统
语音识别转文字
kaldi开发技术服务
软件开发
运动控制卡上位机
机械加工软件
软件开发培训
Java 安卓移动开发
VC++
C#软件
汇编和破解
驱动开发
联系方式
固话:0371-63888850
手机:138-0381-0136
Q Q:396806883
微信:ryysoft

基于DNN技术实现语音识别的vystoin开源项目


Project Overview

In this notebook, you will build a deep neural network that functions as part of an end-to-end automatic speech recognition (ASR) pipeline!

We begin by investigating the LibriSpeech dataset that will be used to train and evaluate your models. Your algorithm will first convert any raw audio to feature representations that are commonly used for ASR. You will then move on to building neural networks that can map these audio features to transcribed text. After learning about the basic types of layers that are often used for deep learning-based approaches to ASR, you will engage in your own investigations by creating and testing your own state-of-the-art models. Throughout the notebook, we provide recommended research papers for additional reading and links to GitHub repositories with interesting implementations. 我们首先研究将用于训练和评估模型的LibriSpeech数据集。您的算法首先将将任何原始音频转换为常用于ASR的要素表示。然后,您将继续构建可以将这些音频特征映射到转录文本的神经网络。在了解了常用于基于深度学习的ASR方法的基本图层类型后,您将通过创建和测试自己最先进的模型来参与自己的调查。在整个笔记本中,我们提供了推荐的研究论文,以便进行额外阅读,并通过有趣的实现链接到GitHub存储库。


Project Instructions

Amazon Web Services
This project requires GPU acceleration to run efficiently. Please refer to the Udacity instructions for setting up a GPU instance for this project, and refer to the project instructions in the classroom for setup. link for AIND students该项目需要GPU加速才能有效运行。有关为此项目设置GPU实例的信息,请参阅Udacity说明,并参阅教室中的项目说明进行设置。

Local Environment Setup本地环境设置

You should run this project with GPU acceleration for best performance. 您应该使用GPU加速运行此项目以获得最佳性能

  1. Clone the repository, and navigate to the downloaded folder. 克隆存储库,然后导航到下载的文件夹。

git clone https://github.com/udacity/AIND-VUI-Capstone.git
cd AIND-VUI-Capstone

  1. Create (and activate) a new environment with Python 3.6 and the numpy package. 使用Python 3.6和numpy包创建(并激活)新环境。
    • Linux or Mac:
  • conda create --name aind-vui python=3.5 numpy
  • source activate aind-vui
    • Windows:
  • conda create --name aind-vui python=3.5 numpy scipy
  • activate aind-vui
  1. Install TensorFlow. 安装TensorFlow。
  • Option 1: To install TensorFlow with GPU support, follow the guide to install the necessary NVIDIA software on your system. If you are using the Udacity AMI, you can skip this step and only need to install the tensorflow-gpupackage: 选项1:要安装支持GPUTensorFlow,请按照指南在系统上安装必要的NVIDIA软件。如果您使用的是Udacity AMI,则可以跳过此步骤,只需要安装该tensorflow-gpu软件包:
  • pip install tensorflow-gpu==1.1.0
  • Option 2: To install TensorFlow with CPU support only, 选项2:仅安装具有CPU支持的TensorFlow
  • pip install tensorflow==1.1.0
  1. Install a few pip packages. 安装几个pip包。

pip install -r requirements.txt

  1. Switch Keras backend to TensorFlow. 将Keras后端切换到TensorFlow。
    • Linux or Mac:
  • KERAS_BACKEND=tensorflow python -c "from keras import backend"
    • Windows:
  • set KERAS_BACKEND=tensorflow
  • python -c "from keras import backend"
    • NOTE: a Keras/Windows bug may give this error after the first epoch of training model 0: ‘rawunicodeescape’ codec can’t decode bytes in position 54-55: truncated \uXXXX . To fix it:
      • Find the file keras/utils/generic_utils.py that you are using for the capstone project. It should be in your environment under Lib/site-packages . This may vary, but if using miniconda, for example, it might be located at C:/Users/username/Miniconda3/envs/aind-vui/Lib/site-packages/keras/utils.
      • Copy generic_utils.py to OLDgeneric_utils.py just in case you need to restore it.
      • Open the generic_utils.py file and change this code line:
        marshal.dumps(func.code).decode(‘raw_unicode_escape’)
        to this code line:
        marshal.dumps(func.code).replace(b’\’,b’/’).decode(‘raw_unicode_escape’)

注意:在训练模型0的第一个时期之后,Keras / Windows错误可能会出现此错误:‘rawunicodeescape’ codec can’t decode bytes in position 54-55: truncated \uXXXX 。要解决这个问题:

  1. 找到keras/utils/generic_utils.py您用于capstone项目的文件。它应该在你的环境下Lib/site-packages。这可能会有所不同,但如果使用miniconda,它可能位于C:/Users/username/Miniconda3/envs/aind-vui/Lib/site-packages/keras/utils。
  2. 复制generic_utils.py到OLDgeneric_utils.py以防万一你需要恢复它。
  3. 打开generic_utils.py文件并更改此代码行:
    marshal.dumps(func.code).decode(‘raw_unicode_escape’)
    到此代码行:
    marshal.dumps(func.code).replace(b’\’,b’/’).decode(‘raw_unicode_escape’)
      •  
  4. Obtain the libav package. 获得libav包
    • Linux: sudo apt-get install libav-tools
    • Mac: brew install libav
    • Windows: Browse to the Libav website
      • Scroll down to "Windows Nightly and Release Builds" and click on the appropriate link for your system (32-bit or 64-bit).
      • Click nightly-gpl.
      • Download most recent archive file.
      • Extract the file. Move the usr directory to your C: drive.
      • Go back to your terminal window from above.

浏览到Libav网站

  1. 向下滚动到“Windows Nightly and Release Builds”,然后单击适用于您系统的相应链接(32位或64位)。
  2. 点击nightly-gpl。
  3. 下载最新的存档文件。
  4. 提取文件。将usr目录移动到C:驱动器。
  5. 从上面回到终端窗口。
  • rename C:\usr avconv
  • set PATH=C:\avconv\bin;%PATH%
  1. Obtain the appropriate subsets of the LibriSpeech dataset, and convert all flac files to wav format. 获取LibriSpeech数据集的相应子集,并将所有flac文件转换为wav格式。
    • Linux or Mac:
  • wget http://www.openslr.org/resources/12/dev-clean.tar.gz
  • tar -xzvf dev-clean.tar.gz
  • wget http://www.openslr.org/resources/12/test-clean.tar.gz
  • tar -xzvf test-clean.tar.gz
  • mv flac_to_wav.sh LibriSpeech
  • cd LibriSpeech
  • ./flac_to_wav.sh
    • Windows: Download two files (file 1 and file 2) via browser and save in the AIND-VUI-Capstone directory. Extract them with an application that is compatible with tar and gz such as 7-zip or WinZip. Convert the files from your terminal window. 通过浏览器下载两个文件(文件1文件2)并保存在AIND-VUI-Capstone目录中。与兼容tar与gz应用程序提取它们如7-ZIPWinZip的。从终端窗口转换文件。
  • move flac_to_wav.sh LibriSpeech
  • cd LibriSpeech
  • powershell ./flac_to_wav.sh
  1. Create JSON files corresponding to the train and validation datasets. 创建与训练和验证数据集对应的JSON文件。

cd ..
python create_desc_json.py LibriSpeech/dev-clean/ train_corpus.json
python create_desc_json.py LibriSpeech/test-clean/ valid_corpus.json

  1. Create an IPython kernel for the aind-vui environment. Open the notebook. 为环境创建一个IPython内核aind-vui。打开笔记本电脑

python -m ipykernel install --user --name aind-vui --display-name "aind-vui"
jupyter notebook vui_notebook.ipynb

  1. Before running code, change the kernel to match the aind-vui environment by using the drop-down menu. Then, follow the instructions in the notebook. 在运行代码之前aind-vui,请使用下拉菜单更改内核以匹配环境。然后,按照笔记中的说明进行操作。


Suggestions to Make your Project Stand Out!

(1) Add a Language Model to the Decoder
The performance of the decoding step can be greatly enhanced by incorporating a language model. Build your own language model from scratch, or leverage a repository or toolkit that you find online to improve your predictions.
(2) Train on Bigger Data
In the project, you used some of the smaller downloads from the LibriSpeech corpus. Try training your model on some larger datasets - instead of using dev-clean.tar.gz, download one of the larger training sets on the website.
(3) Try out Different Audio Features
In this project, you had the choice to use either spectrogram or MFCC features. Take the time to test the performance of bothof these features. For a special challenge, train a network that uses raw audio waveforms!


Special Thanks

We have borrowed the create_desc_json.py and flac_to_wav.sh files from the ba-dls-deepspeech repository, along with some functions used to generate spectrograms.

友情链接
版权所有 Copyright(c)2004-2021 锐英源软件
公司注册号:410105000449586 豫ICP备08007559号 最佳分辨率 1024*768
地址:郑州大学北校区院(文化路97号院)内