Text to sound conversion now available.

Anything GoesCarlos5 May 20116

Any text file, at speeds of 25wpm through 140, with any interval using Cepstral’s Lawrence voice.

Free, but I’d like to share the resulting files.

If you have the language Python 2 installed, you can have copy of the program. A few lines need to be changed for Python 3. You’ll also need a copy of Cepstral’s Lawrence voice, and the Swift program that comes with it.

Over 140wpm might be possible, but I’ll have to play with secondary variables. Swift won’t do it automatically.

Other voices can be used, but aren’t calibrated. Swift’s stated wpm are very inaccurate.

Cricket

(by
Cricket for everyone)

Previous post: [ Penmanship Practice – May 2011 # 2 ]
Next post: [ Taking the King’s Speech ]

6 comments Add yours

Carlos says:

7 May 2011 at 4:45 pm

About the wpm count, is Swift actually counting words? If that is the case, it would need to be modified for Gregg.

I downloaded "Miguel" to check out the Spanish voice ("Marta", the other Spanish voice, was 85 Mb, vs. "Miguel" which was 33 Mb!). He doesn't sound good, especially when you play it slow.

Log in to Reply
Cricket Onebit says:

8 May 2011 at 5:08 pm

No, Swift is not counting the words, and the program does not trust what Swift calls rate and measures in wpm. Swift's wpm setting is very inaccurate, and worse for some voices than others.

I agree, the super-slow files are almost impossible to listen to. For anything under 85wpm, the program keeps the rate parameter around 120 and changes the time between words.

Yes, some voices are pretty bad. Some sound like there are two voices at once for some sound pairs. Lawrence is good enough, but I can't stand Allison, even though Allison has a higher sampling rate. It doesn't sound as good as a human reader, but with the program, I can type in the passage, then let the software do all the rest for a full set of speeds.

To calibrate, I gave it a passage of 600 words from GSF2 (the last chapter that has a transcript), and the word-count from the book. It sends the passage to Swift, along with trial delays and rates, and finds the actual length of each sound file. From that, it calculates calibration parameters. Currently, I have to edit the program directly to put in those parameters.

The program will test any passage against any range of speeds. The curves are accurate to within 1wpm over the entire range (25-140wpm), at least for that passage. I haven't tested it against many passages yet, but doubt it will be worse than 2wpm. Since passages with the same syllable-count can vary widely in difficulty, I didn't refine it further.

It is possible to tell it to run several tests on each passage and create one of exactly the correct length, but that triples (or worse) the time for each passage.

Log in to Reply
Cricket Onebit says:

8 May 2011 at 5:18 pm

Ranges above 140 will need a different calibration curve, thanks to an anomaly in Swift's rate setting. I didn't bother finding it, since I'm only at 60wpm, but it should be easy enough to do.

Log in to Reply
Carlos says:

11 May 2011 at 5:23 am

Have you tried Natural Soft? It's a free download.

Log in to Reply
Cricket Onebit says:

11 May 2011 at 2:34 pm

Hi McBud,

Natural Soft won't work as well for several reasons, but none of them is a deal-breaker. I went with Cepstral initially because their sales rep was the only one to get back to me about super-slow speeds.

NS does not support the command line, so my program can't call it directly. Converting text to a specific speed would be two steps: Use my program to create the ssml file(s) with the delay tags, then run that file through NS. You can run batches of files within NS, so that wouldn't be too bad. Until recently, that's what you had to do with my program as well.

(Tech info: My program creates ssml files, which are regular text files with extra codes. The voice program takes the ssml file and other parameters to create the sound file. The latest version calls Cepstral directly, saving the user a step.)

Also, you'd have to remember which ssml file goes with which speech rate setting to get each speed. It might be possible to include the rate in the file header automatically.

NS isn't as free as it looks. The only free voice is MicroSoft Mary, which comes free with Windows. NS doesn't make voices. It sits between other programs (like Word) and other companies' voices. NS makes it easy to buy other companies' voices. The free voices are good enough for this purpose.

It may not support ssml codes (which add space between words), but there is hope. It supports SAPI 5 XML, which is similar. It's hard to say how long it will take to follow the circles and red-herrings to find the necessary codes.

The built-in speed setting isn't as fine-grained as Swift. That's not a big problem, since the fine-tuning is done with the delay between words.

Let me know if you want me to look into it. The hardest part will be tracking down the SAPI codes. Only a few lines of the program need changing for proof of concept testing.

Cricket

Log in to Reply
Carlos says:

11 May 2011 at 3:07 pm

Cool. Thanks for the info!

Log in to Reply