Computerize dictation

Having a programmer in the house can be handy!

Hubby has written an online utility which, combined with a (free) voice synthesizer, will dictate text at the speed you choose.

It does this by adding time between the words rather than slowing down the words. (Slowing down the words too much becomes incomprehensible.)

Functional rather than fancy, but reliable. (I notice he’s even put in code for invalid entries, but haven’t tested it.)

Quick Instructions

Enter the text on the page as the form says. Hit “Translate”.

Copy the resulting text into a text-to-speech program such as Cepstral. It has to recognize SSML (Speech Synthesis Markup Language, similar to HTML and XML. More here: )

Press “Play.”

More Details

The program replaces spaces with the “break” command. Not sure he included other whitespace or what happens if there is more than one space in a row. (He also did dishes that night, so I didn’t get too picky.)

I use Ceptsral’s SwiftTalker.

They have an online demo at
that sends you a wav file.

To get SwiftTalker, download and install any of their free voices; it gets installed with the voice. If using a free voice, it sticks “buy me” in every now and then.

SwiftTalker will save to a wav file. (File / Export) Audacity or Nero should be able to convert it to MP3 or a CD or whatever.

You may have to go into “tools / options / text handling ” so it handles SSML. You can also play around with the speed in WPM. Lots of things to play with.

I tried calculating wpm vs break using my own formula, but obviously missed a variable or three because it didn’t match experimental data. If anyone wants to record some numbers showing Ceptsral’s wpm, delay time, and actual results, I’ll post the results.

When and if we re-organize the site, there might be a nicer front-end to the program, with more instructions. You will be able to reach it through, search for the word “dictation”.

I asked him to handle up to 500 words. It might handle more.

I look forward to hearing whether this works. I can probably get him to improve it next weekend or the one after, if I do dishes, so please give us feedback.



(by cricketbeautiful-1
for everyone)

Previous post:
Next post:
21 comments Add yours
  1. I get so excited everytime I see Gregg take another step into the 21st Century.  Tell your other half that we would gladly pay him for this excellent service had we a group bank account, and hope that he will accept as payment instead his heavenly reward!   I tried it out using Windows Text To Speech, but it didn't work, as you probably knew (I'm dumb).  So I'll try to figure out how to do it the right way when my office inbox doesn't look so scary.

  2. I can only say: oh, WOW!  This is the niftiest thing I've played with in ages! The Ceptral SwiftTalker program is truly amazing.  I downloaded Allison, and she's now my new friend .  You can enter text directly (without markup language), and adjust the dictation speed, from 85 to 340 wpm.  If you want slower speeds, you will need to add SSML tags, as Cricket showed above.   I entered a small letter from Gregg Speed Studies, Third Edition, and had it dictated at 146 wpm.  The program created a WAV file, which I converted to MP3.  The result can be seen in the Documents Section, Audio Files folder.   Other than the annoying reminder for registration, it is pretty darn good!   Thank you for sharing, Cricket.

  3. 22MHz "Callie" has been my best friend for about a year. The dictation
    practice is great, but what I really like is being able to go through
    the Gutenburg library on the drive to and from work, cleaning the
    dishes, etc. I've "read" double the number of books and essays this
    year. No question it was worth the $30.

    Any mathematics-types out there? I've been using a shell script to
    add those word breaks, but I can't figure out the algorithm to go from
    a WPM rate to the break length. Given: 150 speech rate from the TTS
    program, and 60 seconds in a minute, how would you get the break-per-word
    rate number?

  4. Yikes! Allison sounds gleefully predatory, and I did see Halloween stuff at Walmart already to set the stage.   Seriously, though, all this stuff y'all are talking about is quite amazing, and I wish I were not so ignorant of computer stuff – it sounds so complicated!   Just think, too, it really does solve the problem of getting dictation material – no end to what you can type in as you wish!   Jim

  5. I'm not sure that you can use a formula — the length of the words is too variable. And my new best friend, David Cepstral, seems to put in very slight elongation of vowels for emphasis.   Seems to me that we'd somehow have to take into account the exact length of the take when spoken at exactly 100 or 150 etc, and then divide by the number of words… blah blah blah   You'd still have to know exactly how many of the seconds are used by the speed before you can slow it down.   Am I making sense?   sidhe 

  6. Sorry. Clearly I'm not a mathematical genius.   I did a few tests, and found that 105 words with a total of 113 syllables makes a WAV file of 49.42 seconds.   So I put in 138 words with a total of 148 syllables (105 x 1.4 = 147) and got a WAV file of 63.76 seconds.   So it looks like they are using a standard word of approximately 1.4 syllables, maybe?   sidhe

  7. Thanks sidhetaba, your assistance has been immortalized (as far as the web can be considered permanent). I'll add more info as it comes in.

    As for exactness, we'll have to settle for something "close enough", unless we want to reopen the debate on how to count text that's got longer words, or if it's susceptible to heavy abbreviations, or unusual words, or peak vs sustained, or …

    Let's standardize on this text:

    There was once a velveteen rabbit, and in the beginning he was really splendid. He was fat and bunchy, as a rabbit should be; his coat was spotted brown and white, he had real thread whiskers, and his ears were lined with pink sateen. On Christmas morning, when he sat wedged in the top of the Boy's stocking, with a sprig of holly between his paws, the effect was charming.

    70 words
    89 syllables
    1.27 syllables / word (Not standard 1.4, but I like the story.)

    We have to do it experimentally. As an engineer, I can tell you we're in good company; most useful (as in, we can build with it now) formula began as experimental results rather than ivory-tower theories.

    Good point about different speakers; if we don't record them, we won't know if there's a difference. Also, we should record the wpm from the tool menu; the default is 170.

    As always in the real world, the experimental protocol changes as the experiment is conducted.

    Thanks again!


  8. Sorry, neglected to say that I chose 105 wpm from the Tools-Options menu, and that's why I used 105 words / 147 syllables.   It seemed to me that if we wanted, say, a dictation at 50 wpm, it would be easier to start with a lower wpm on the voicing system.

  9. Wow! I'm thinking of the different possibilities

    I will experiment using the free voices …but I'm also planning to pay the $30 as well.

    I don't know how my wpm speech is as of yet. I am thinking it is 50 wpm in the anniversary version.

    Thanks cricket.

  10. Thanks for spotting that.

    It's now fixed.


    Grrrrr, I'd added just a bit of text, saying the experiments were with Cepstral's David. Except, there's a quotation mark in there, which didn't match other quotation marks, and messed up the program. Such a simple little change,…

  11. I did some timing experiments this morning, and discovered things that should have come as no surprise.

    1. My days as an engineer are totally over. 15 years of documentation and housewife math have me back at a grade 11, if that. Sigh. Got some nice Excel graphs out of it.

    2. If you change the syllabic intensity, you change the results, especially at the lower speeds.

    The full results are at


  12. Well, I did it. I finally shifted from "advocate of computerized dictation" to user.

    A "delay" of 750 gives 43 to 47 wpm (measuring from the first word to be dictated till the end of the passage, including the time spent on the "please buy me" phrases). This was measured using the first three units of the Anniversary manual.

    I'd already pushed these passages up to about that speed by looking at the text and then back to my writing.

    The words are sometimes hard to hear. In normal speech there are no gaps between words; the brain inserts them. The end of one word and the beginning of the next affect each other. So, I suspect the program pronounces the words as if they were joined, rather than one word at a time. E.g., the word "go" sounds like "o".

    I'm guessing the hard to hear bit is slowing me down as much as flipping between text and writing used to.

    It's somewhat nerve-wracking to do this. At times, resisting the temptation to hit "pause" was slowing me down. I'll have to review the advice on how to take practice dictation.

    It's not perfect, but better than the available alternatives.

  13. It's official. A free voice, at delay of 750, is 45-50 wpm.

    I used David. I timed 8 separate passages of 50-100 words each. I started timing with the first word to be written and stopped when first word of the next passage started.

    I counted the advertisements as extra writing time but not the words. Some passages had more advertising time than others. I suspect a registered voice would be consistently 50 wpm.

    I could easily gain speed by counting the words of the advertisement, and, since they're repetitive, creating a shortcut.



    If you replace the space between words with a dash, no delay is put between them. This is good when learning phrases, but makes the wpm more controversial.


  14. Does anybody know where I can get shorthand speed pins?  Used or new – doesn't matter.  I love Gregg Shorthand and I'm currently studying machine shorthand.  It would be neat if I could reward my friend and me with pins everytime we pass a speed class.  Thanks for your help.  Please email me at [email protected] (there's an underscored space between "Sherry" and "Payton"). 

  15. More test results:

    Again using the free David voice, not including the time for the initial "buy me" bit. Including the time for the ads after that, but not counting those words. SwiftTalker set for 170 wpm, the default. Using all the text in chapter 2 Anni, about 768 words. A word is a word; I didn't count syllables.

    Delay WPM
    1250 33
    900 40
    650 50
    450 60
    300 70

Leave a Reply