C# Speech Recognition - Is this what the user said?

C#Speech RecognitionText to-Speech

C# Problem Overview


I have need to write an application which uses a speech recognition engine -- either the built in vista one, or a third party one -- that can display a word or phrase, and recognise when the user reads it (or an approximation of it). I also need to be able to switch quickly between languages, without changing the language of the operating system.

The users will be using the system for very short periods. The application needs to work without the requirement of first training the recognition engine to the users' voices.

It would also be fantastic if this could work on Windows XP or lesser versions of Windows Vista.

Optionally, the system needs to be able to read information on the screen back to the user, in the user's selected language. I can work around this specification using pre-recorded voice-overs, but the preferred method would be to use a text-to-speech engine.

Can anyone recommend something for me?

C# Solutions


Solution 1 - C#

A similar question was asked on Joel on Software a while back. You can use the System.Speech.Recognition namespace to do this...with some limitations. Add System.Speech (should be in the GAC) to your project. Here's some sample code for a WinForms app:

public partial class Form1 : Form
{
  SpeechRecognizer rec = new SpeechRecognizer();

  public Form1()
  {
    InitializeComponent();
    rec.SpeechRecognized += rec_SpeechRecognized;
  }

  void rec_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
  {
    lblLetter.Text = e.Result.Text;
  }

  void Form1_Load(object sender, EventArgs e)
  {
    var c = new Choices();
    for (var i = 0; i <= 100; i++)
      c.Add(i.ToString());
    var gb = new GrammarBuilder(c);
    var g = new Grammar(gb);
    rec.LoadGrammar(g);
    rec.Enabled = true;
  }

This recognizes the numbers from 1 to 100, and displays the resulting number on the form. You'll need a form with a label called lblLetter on it.

System.Speech only works with a pre-defined list of words or phrases; it's not exactly NaturallySpeaking, either in versatility or in recognition quality. But you don't have to train it to the user's voice, and if you only have a few different things the user can say, it works reasonably well. And it's free! (if you have Visual Studio)

It won't work well if you use very short phrases; I made a program for my kid to say letters of the alphabet and see them on-screen, but it doesn't do that well since many of the letters sound alike (especially from the mouth of a four-year-old).

As for more flexible options...well, there's the aforementioned NaturallySpeaking, which has an SDK. But you have to contact sales to get any sort of access to it, and no pricing is listed, so it comes across as one of those "How much does it cost? Well, how much have you got?" kind of things. There doesn't seem to be a "download and play around with it" option. :(

As for text-to-speech, System.Speech.Synthesis does this. It's even easier than the speech recognition. I wrote a small program to let me type, hit Enter, and read the text aloud. My four-year-old gets mesmerized by it. :) ("Daddy, I wanna tawk to da wobot.")

Solution 2 - C#

[Note: I was the development lead for the managed speech recognition API in .NET 3.0]

System.Speech is part of .NET 3.0, so it is available on both Vista and XP. In Vista you have the added benefit of having a speech recognition engine pre-installed by the OS. On XP you choices are: use the SAPI 5.1 SDK with a very old engine (but might work well enough for your command and control scenario), install Office 2003 which installs a newer version of the recognizer. There are a few SAPI 5 complient speech recognition engines available as well.

If you need to switch languages, you will want to use the System.Speech.Recognition.SpeechRecognitionEngine class which allows you to choose the SR engine for the language you need to support. Note that engines are defined by a set of languages they support (they might be using the same binary, only swapping data files to support additional languages).

Comment if you need to know more.

Philipp

Solution 3 - C#

Before this add 'Speech' reference

System.Speech

Found that the code example posted by Kyralessa on Oct 22nd didn't work for me but a slightly revised version did. When adding strings into the Choices object use full text English words not numbers. Seems the MS speech recognition engine can't recognize numbers by themselves.

I have marked these modifications with some commenting added to the previous example.

public partial class Form1 : Form
{
  SpeechRecognizer rec = new SpeechRecognizer();

  public Form1()
  {
    InitializeComponent();
    rec.SpeechRecognized += rec_SpeechRecognized;
  }

  void rec_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
  {
    lblLetter.Text = e.Result.Text;
  }

  void Form1_Load(object sender, EventArgs e)
  {
    var c = new Choices();

    // Doens't work must use English words to add to Choices and
    // populate grammar.
    //
    //for (var i = 0; i <= 100; i++)
    //  c.Add(i.ToString());

    c.Add("one");
    c.Add("two");
    c.Add("three");
    c.Add("four");
    // etc...

    var gb = new GrammarBuilder(c);
    var g = new Grammar(gb);
    rec.LoadGrammar(g);
    rec.Enabled = true;
  }

Solution 4 - C#

If the engine is what you're asking about then I've found (beware, I'm just listing, I haven't tried any of them):

Lumenvox engine

you also have the SAPI SDK from Microsoft itself, I've only tried it for text to speech but according to its definition:

The SDK also includes freely distributable text-to-speech (TTS) engines (in U.S. English and Simplified Chinese) and speech recognition (SR) engines (in U.S. English, Simplified Chinese, and Japanese).

Solution 5 - C#

Be warned that you're not going to get good results if you don't require training first. Speech recognition is a statistical application of phonetics, a field which is pretty frank about the fact that there's so much variation in the signal that it's almost a miracle anyone can understand what anyone else says. An off-the-shelf speech recognition engine will most likely tend towards a more general accent of English, but will fail miserably for anything even slightly different.

That's why training is so important. We can do well by overfitting with ease, especially if we reduce the problem space. But creating an extensible machine learning solution? Therein always lies the rub.

That being says, consider Sphinx-4. It's an off-the-shelf solution written in Java available at http://cmusphinx.sourceforge.net/sphinx4/

Solution 6 - C#

Solution 7 - C#

Dragon Naturally Speaking SDK might be worth looking at. This project looked interesting.

Haven't got to play with either of them though.

Solution 8 - C#

Text to speech is available with the Speech API. Personally, I'd probably require Vista and use the managed interfaces to System.Speech.SpeechRecognition and System.Speech.Synthesis.TtsEngine, but a P/Invoke should be possible into the unmanaged APIs if you really need XP support.

Solution 9 - C#

Try Microsoft Speech Server, which I think now is part of Office Communication Server 2007. It contains a SR/TTS engines, C# API and tools that integrate with Visual Studio.

Solution 10 - C#

This is the article from MSDN magazine that first discussed using the System.Speech APIs for Vista. Some of it is out of date because the API changed between beta (when the article was written) and the release of Vista, but this is still one of the best resources I've found and covers a good intro to the System.Speech namespace. See http://msdn.microsoft.com/en-us/magazine/cc163663.aspx

Solution 11 - C#

Well, this question already has many good responses but I think it is valuable to update with some info from 2016 documentation the responses from Rob Segal and Philipp Schmid pointing to this nice code example:

https://msdn.microsoft.com/en-us/library/office/system.speech.recognition.speechrecognitionengine.aspx

It did not use the shared recognizer of Windows (The little Windows Mic that shows out up in the middle of the screen), it use a nice in app SpeechRecognitionEngine that not need any visual cue. The UI is completly at your control.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRichieACCView Question on Stackoverflow
Solution 1 - C#Ryan LundyView Answer on Stackoverflow
Solution 2 - C#Philipp SchmidView Answer on Stackoverflow
Solution 3 - C#Rob SegalView Answer on Stackoverflow
Solution 4 - C#Jorge CórdobaView Answer on Stackoverflow
Solution 5 - C#Robert ElwellView Answer on Stackoverflow
Solution 6 - C#stephbuView Answer on Stackoverflow
Solution 7 - C#itsmattView Answer on Stackoverflow
Solution 8 - C#Mark BrackettView Answer on Stackoverflow
Solution 9 - C#dbkkView Answer on Stackoverflow
Solution 10 - C#Michael LevyView Answer on Stackoverflow
Solution 11 - C#Fidel OrozcoView Answer on Stackoverflow