使用Microsoft SAPI进行语音合成

Introduction

Microsoft provides a great tool for both speech recognition and synthesis. It is called Microsoft Speech API. Here I'll introduce the various features you can use in the synthesis of speech with SAPI. Please find a full application in the downloads.Microsoft为语音识别和综合提供了一个很好的工具。它被称为Microsoft Speech API。在这里，我将介绍可以在SAPI语音合成中使用的各种功能。请在下载中找到完整的应用程序。

Use of Code

For speech synthesis, we get the System.Speech.Synthesis namespace. The main class is the SpeechSynthesizer which runs in-process. You can set its output to be the default audio device or a wav stream/file. Then you simply call SpeakAsync() with the text to be spoken. For customization, you can specify the voice to be used. Right now, for Vista, we only get one voice: 'Microsoft Anna'. I've seen some other demos with a voice called 'Microsoft Lili', which I believe spoke Chinese. What was really interesting about that voice is it could also speak English, which made the voice sound like a native Chinese speaker speaking English ... very cool. Supposedly, you can get other voices by installing the MUI packs on Vista ... but I have yet to track any of these down to try it out. XP should have some other voices to use like 'Microsoft Mary' and 'Microsoft Sam'. For the synthesizer, you can also customize its volume and rate of speaking. For 'pitch', you can change a prompt's emphasis, or do this it SSML using the <prosody> tag.对于语音合成，我们得到System.Speech.Synthesis命名空间。主要类是SpeechSynthesizer在进程中运行。您可以将其输出设置为默认音频设备或wav流/文件。然后你只需调用SpeakAsync()和要说的文字。对于自定义，您可以指定要使用的语音。现在，对于Vista，我们只有一个声音：'微软安娜'。我见过其他一些名为“微软莉莉”的演示，我相信它会说中文。这个声音的真正有趣之处在于它还可以说英语，这使得这个声音听起来就像一个说英语的母语汉语......非常酷。据说，你可以通过在Vista上安装MUI包来获得其他声音...但我还没有跟踪其中的任何一个以试用它。XP应该有其他一些使用的声音，如'Microsoft Mary'和'Microsoft Sam'。对于合成器，您还可以自定义其音量和说话率。对于“音调”，您可以更改提示的重点，或者使用<prosody>标签对其进行SSML 。

Speaking, in general, is as simple as this:总的来说，说起来就像这样简单：

SpeechSynthesizer synth = new SpeechSynthesizer();
synth.SpeakAsync("Hello World.");

Volume, by default, is 50 (range 0 to 100), and rate, by default, is 0 (range -10 to +10). You can change them with the variables synth.Volume and synth.Rate, respectively. If you are working with WPF or Silverlight, you can simply assign a slider for this. There are two methods in the SpeechSynthesizer class to speak:

Speak(): Will speak in sync with the current thread.
SpeakAsync(): Will speak in a different thread. So obviously, changes made in the volume or rate won't affect the parameters during runtime. They will be changed in the next cycle.
默认情况下，音量为50（范围0到100），默认情况下，速率为0（范围-10到+10）。您可以分别使用变量synth.Volume和更改来更改它们synth.Rate。如果您使用的是WPF或Silverlight，则可以为此分配一个滑块。在SpeechSynthesizer课堂上有两种方法可以说：
1. Speak()：将与当前线程同步说话。
2. SpeakAsync()：将以不同的方式发言。很明显，在体积或速率上所做的更改不会影响运行时的参数。它们将在下一个周期中更改。

One of the coolest features is that we can directly use Wave files (.wav) as an I/O medium for the sound. You can set your output to anything like the default audio device, a wave file, an audio stream etc. This is an example for a wave file output.最酷的功能之一是我们可以直接使用Wave文件（.wav）作为声音的I / O介质。您可以将输出设置为默认音频设备，波形文件，音频流等。这是波形文件输出的示例。

synth.SetOutputToWaveFile("output.wav");
synth.Speak(textBox1.Text);
synth.SetOutputToDefaultAudioDevice();
MessageBox.Show("done");

Basically, what I've done here is, first set the output to a Wave file (it'll be created if not present), then made it to speak in the Wave file. Then put back the output to the default device for further operations.

Now, there is an XML based language called SSML by W3C that specifies the standards of speech delivery. You can completely specify how a speech should be spoken. Fortunately, SAPI supports this standard. We can generate a prompt from SSML to directly deliver a speech.

基本上，我在这里所做的是，首先将输出设置为Wave文件（如果不存在则创建它），然后使其在Wave文件中说话。然后将输出放回默认设备以进行进一步操作。

现在，W3C有一种名为SSML的基于XML的语言，它规定了语音传送的标准。您可以完全指定如何说出语音。幸运的是，SAPI支持此标准。我们可以从SSML生成提示以直接发送语音。

Some useful tags of SSML:一些有用的SSML标签：

audio: To take an input from some Wave file.
emphasis: Specifies that the enclosed text should be spoken with emphasis.
enumerate: An automatically generated description of the choices available to the user.

It specifies a template that is applied to each choice in the order they appear in the menu element, or in a field element that contains option elements.

phoneme: Specifies a phonetic pronunciation for the contained text. The format of the representation is vendor-specific, and does not always use the IPA alphabet. See your vendor documentation for details.
prosody: Specifies prosodic information for the enclosed text such as pitch, duration, range, contour etc.
audio：从某些Wave文件中获取输入。
emphasis：指定应强调说出所附文本。
enumerate：自动生成的用户可用选择的描述。

它指定一个模板，按照它们在菜单元素中出现的顺序或包含选项元素的字段元素应用于每个选项。

phoneme：为包含的文本指定语音发音。表示的格式是特定于供应商的，并不总是使用IPA字母表。请参阅供应商文档以获取详细信
prosody：指定所包含文本的韵律信息，如音高，持续时间，范围，轮廓等。

You can speak out from an SSML file using PromptBuilder, like this:您可以使用SSML文件说出来PromptBuilder，如下所示：

PromptBuilder pb = new PromptBuilder();
pb.AppendText("Hello..");
try
{
  pb.AppendSsml("SSML.xml");
}
catch (Exception exc) 
{
  MessageBox.Show(exc.Message);
}
synth.SpeakAsync(pb);

Another interesting feature of this API is that we can get an XML output of whatever we speak. And the reason I say it's important is that SSML can work as an intermediate language for programs written in any platform. For example, you speak something on an ASP.NET website, then create an XML out of it and pass to a Web Service. This service will return the same file to its client based on Java. So you could achieve a good interoperability in two programs. Here's the way to work with it:这个API的另一个有趣的特性是我们可以获得我们所说的任何XML输出。我之所以说重要的原因是SSML可以作为任何平台编写的程序的中间语言。例如，您在ASP.NET网站上讲一些内容，然后从中创建XML并传递给Web服务。此服务将基于Java将相同的文件返回给其客户端。因此，您可以在两个程序中实现良好的互操作性。以下是使用它的方法：

PromptBuilder myPrompt = new PromptBuilder();
myPrompt.AppendText(textBox1.Text);
MessageBox.Show(myPrompt.ToXml());

友情链接

汕头招聘网 | 山东招聘网 | 郑州教育培训 | 软件下载