Speech to Text Programs for Dictation

Has anyone tried using speech-to-text programs (rather than buying dictation tapes or repeating the same thing at different speeds)?

I’ve played with the one built-in to XP. Normal seems 240wpm, slowest is about 80, but sounds horrid; the words are drawn out rather than adding time between them. I barely manage 30 for passages I’ve already practiced, so this won’t work for me.

I could repeat each word in the source two or three times, or would that just make a mess of things?

Also, any advice on speeds while still learning the theory? Assuming the form stays good, how fast should I get before moving to the next chapter? Is there a progression? X wpm for chapter 1, Y for chapter 2?

(by cricketbeautiful-1 for everyone)

13 comments Add yours
  1. There's a short "Text-to-speech Programs" thread here:


    Don't forget to checkout the Gutenburg site for your free
    dictation material!

    I think there's still a paper in the documents section about a "pyramid" speed building program.

    Also, the forum abounds in advice about taking dictation. Scroll through the older threads.

    One piece of advice that I recall is to take dictation at both lower speeds (at which you are almost competent), and higher speeds which stretch your ability. Don't correct as you go, but only afterward.

    Good luck!

  2. Another important point is when you're learning shorthand, to use the material for dictation that you have already studied, and not new material. This will do two things: (1) reinforce the vocabulary, and (2) give you confidence to achieve the desired speed. Also, if you want, keep your book open while you're taking dictation. It gives you a "security blanket" in case you're stuck.

  3. I couldn't find the pyramid speed building program in the documents. Is it there under a different name? Also, is it specific to Gregg, or does it apply to other systems as well?

    Thanks in advance,

    30 wpm cold, up to 47 wpm with drill! (Yep, baby steps.)

  4. Letter to Ceptsral, makers of text-to-speech software:

    I would like to use a text-to-speech program to help me learn shorthand.

    I need to be able to enter material keyed to the text, and have it read at
    different speeds, typically 20 to 200 wpm.

    1. I would like to be able to specify the rate in wpm. It doesn't have to
    be exact. I think 1.4 syllables per word is average. Increments of 5 wpm
    would be good.

    2. I would need slower speeds, starting at 20 wpm. Below about 80, though,
    time must be added between the words; simply slowing the voice down makes
    it unintelligible.

    This would be a good tool for any person learning shorthand. Practice tapes
    keyed to the books are very expensive (or unavailable), but there is
    usually plenty of printed material in the textbook. Reading the passages
    aloud at different speeds to create practice tapes is very difficult.

    Shorthand is still a useful skill, for meetings when a computer is not
    available or recording is not allowed. Machine shorthand also used
    professionally for closed-captioning and court recording, including
    interviews. Journalists in the UK are required to write pen shorthand at
    100wpm for basic journalist certification. Pen shorthand is better than a
    recording for this because it is faster and less obtrusive to read, and
    easier to go back and check a quote.

    So there is definitely a market for this sort of program.

    Thanks for your consideration,

    A Few Resources:
    Active newsgroup: http://groups.msn.com/GreggShorthand/_whatsnew.msnw
    UK Journalism: http://www.nosweatjt.co.uk/npaper.htm
    Sample dictation files: http://www.stenospeed.com/
    Other jobs using shorthand: http://www.bls.gov/oco/ocos152.htm

  5. Reply from Cepstral (took about 2 days, not too bad)


    Hi Cricket,

    Thank you for your interest in Cepstral voices. This is an interesting
    request. I think you can do something like this by using SSML
    (http://www.cepstral.com/cgi-bin/support?page=faq&type=ssml). I direct your
    attention to the tag.

    It may be possible to contract an engineer to write a simple application
    that would put a between words such that the effective speaking rate
    is X words per minute. We do not offer such functionality natively. I see
    your point that simply slowing the speech down to a crawl isn't what you're
    after…you're looking for normal speech, only with pauses between words so
    you can focus on shorthand skills.

    Kind regards,


  6. I was going to suggest that very technique in my post above. I've tested it myself and it works fine. The problem is that while it's trivial to perform that kind of mass-markup with good old unix tools like sed, (it's literally a one-line program) there's nothing in the average user's Windows world that can save them from doing it by hand, which would be nightmarish.

    I've already written just this kind of program that Craig from Cepstral is talking about for pre-processing text for audio books, and it would only take about an hour to adapt it to do the steno practicing job. I even considered emailing it to them. Problem is, again, it's a unix shell script. And I don't want to bother learning how to write it in C for other platforms.

    I'll post the complete steno-pre-processing program on my own site within a couple of weeks, but you'll need a unix environment to run it. If you're running OSX, it should work fine from a command prompt. If you're running Windows you'll need to install Cygwin. Otherwise, install Linux.

  7. Just so happens I probably have Cygwin; I think hubby uses it when he telecomutes. And wants his next machine to be Linux,…

    Actually, with a bit of practice, that sort of thing is reasonable using search and replace in TextPad. It uses Regular Expressions that look a lot like PHP's, or it can use POSIX.

    Further experiments and results:

    I tried NaturalReader (Demo) on a sample of marked-up text, and it read all the markups. I then viewed it in a browser, and told NaturalReader to read the browser; I got a regular-speed voice.

    I couldn't figure out how to install Cepstral. They had instructions for OSX, but not WindowsXP.

    Any ideas? (Or maybe I should be working on the learning rather than the learning tools.)


  8. My Cepstral voice processes ssml perfectly, but maybe ssml support is not standard for all tts brands(?).

    > I couldn't figure out how to install Cepstral. They had instructions for OSX, but not WindowsXP.

    I guess you already read:

    > (Or maybe I should be working on the learning rather than the learning tools.)

    Not even half as fun.

    If you've already got Cygwin, try a command like:

    $ cat file.txt | sed 's/ //g' >file.ssml


    $ swift -m ssml -f file.ssml -o file.wav

    Then play the file. It works for me.

    Given Cepstral's standard 170wpm rate, the resulting wpm with that 1.8 second break between words would be 170 / (60 + (60 * 1.8)) = 60.6 …I think. I personally feel Cepstral's 170 is a bit quick, and so would prefer:

    $ cat file.txt | sed 's/ //g' >file.ssml
    $ swift -p 'speech/rate=140' -m ssml -f reade_cloister.ssml -o reade_cloister.wav

    It's as simple as that. All I would add in a script is that speech rate algorithm, maybe a couple of command line switches to make it convenient, and some other standard markup. Here's the one I currently use for audio books:

    # ssml-prime — adds the basic ssml markup I like.

    sed '/^$/{
    ' [email protected] > $(echo [email protected] | sed 's/.[^.]*$/.ssml/')

    Let me know if that does it.

Leave a Reply