Java: Text to Speech engines overview

JavaText to-Speech

Java Problem Overview


I'm now in search for a Java Text to Speech (TTS) framework. During my investigations I've found several JSAPI1.0-(partially)-compatible frameworks listed on JSAPI Implementations page, as well as a pair of Java TTS frameworks which do not appear to follow JSAPI spec (Mary, Say-It-Now). I've also noted that currently no reference implementation exists for JSAPI.

Brief tests I've done for FreeTTS (first one listed in JSAPI impls page) show that it is far from reading simple and obvious words (examples: ABC, blackboard). Other tests are currently in progress.

And here goes the question (6, actually):

  1. Which of the Java-based TTS frameworks have you used?
  2. Which ones, by your opinion, are capable of reading the largest wordbase?
  3. What about their voice quality?
  4. What about their performance?
  5. Which non-Java frameworks with Java bindings are there on the scene?
  6. Which of them would you recommend?

Thank you in advance for your comments and suggestions.

Java Solutions


Solution 1 - Java

I've actually had pretty good luck with FreeTTS

Solution 2 - Java

Solution 3 - Java

Actually, there is not a big choice:

  • Festival, most old. Written in C++ but has bindings to Java.
  • eSpeak, quick and simple, used by Google Translate
  • mbrola

Pure Java:

  • FreeTTS, which code was ported from Festival, and then was open-sourced and development was stopped.
  • MaryTTS - more powerful and looks production ready.

Also there is other proprietary programs like:

  • Acapella
  • Nuance Vocalizer

If your software is Windows only, you can use Microsoft Speech API.

Solution 4 - Java

I've used Mary before and I was very impressed with the quality of the voices. Unfortunately, I haven't used any of the other ones.

Solution 5 - Java

I've used AT&T Natural Voices which provides JSAPI and MS SAPI hooks. It provides excellent quality voices, a good "general" speech dictionary, many controls over pronunciation, and multiple languages. It's a little pricey, but works very well.

I used it to read important sensor telemetry to drivers in a mobile sensor application. We had no complaints about the voice quality. It had about 75% out-of-the-box accuracy with scientific terms and a much higher (maybe 90%+) with normal dialogue. We got it up to about 99+% accuracy by using markups (most errors were on scientific terms with unusual phoneme combinations).

It was a bit hard on the processor (we were running on a Pentium-III equivalent machine and it was pushing 50%-75% peak CPU). This uses a native speech engine (Windows, Linux, and Mac compatible) with a Java interface.

There's a huge variety of voices and languages...

Solution 6 - Java

Thanks a lot everyone, the trick is in FreeTTS source. Briefly: if being run as java -jar freetts.jar some-more-args-here, it spells lesser words than when being executed in a manner of bin/Server.jar and bin/Client.jar.

Solution 7 - Java

I used FreeTTS but had a major problem getting the MBrola voices to run on My MacbookPro. I did get MBrola voices to run on Windows (painfully) and Linux. I've had no luck loading any other voice packages on FreeTTS which is a shame because the supplied voices are horrible IMO. Outside of that I had a little success with Cloudgarden as well but that only runs on Windows AFAIK. I'd be interested to hear others successes/failures with Voice engines as this type of work is particular challenging. I'm also toying a bit with Sphinx4. I just pulled down JVXML (which appears to be based on Sphinx4) last night but could not get it to run for some strange reason.

Solution 8 - Java

I've contributed to mary. I feel it has potential if someone smarter than me separated the HMM voices out of the core (those voices don't need large data sets and sound ok). I'm also trying to do a event system to freetts to send events when it says a word. I've had success, but it is broken in linux now. (probably because of a timer bug).

Solution 9 - Java

I found little comfortable with MarryTTS It has multilanguage and clear voice to understand.

T convert speech to text, the better optiion is sphinx4-5prealpha. I give one thumb, because it has adjustable, flexibility and modifiable recognizer and grammer.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDiaWorDView Question on Stackoverflow
Solution 1 - JavapfranzaView Answer on Stackoverflow
Solution 2 - JavanvrandowView Answer on Stackoverflow
Solution 3 - JavaSergey PonomarevView Answer on Stackoverflow
Solution 4 - JavaRyanView Answer on Stackoverflow
Solution 5 - JavaJames SchekView Answer on Stackoverflow
Solution 6 - JavaDiaWorDView Answer on Stackoverflow
Solution 7 - JavaCliffView Answer on Stackoverflow
Solution 8 - Javai30817View Answer on Stackoverflow
Solution 9 - Javasusan097View Answer on Stackoverflow