On writing fiction with voice-recognition software


  1. Geoffrey Parker says:

    I think David Weber does all his writing with speech recognition software.  And there is a lot of it.

  2. Nonentity says:

    As a programming, server admin, and networking guy: I wish voice recognition were even that useful for me.

    For the writing problem, though… maybe it would be possible to record what you wanted on a cheap audio device, and then play the result into VRS?  Then you could go through afterwards and correct things.  That would also allow you to read and hear the writing at the same time while doing corrections, which might make it easier to identify places that need heavy reworking.

    Neither way is ideal, but knowing how bad humans can be with accent differences doesn’t give me much hope of software outperforming us on dictation any time soon.

    • gadgetgirl says:

       An expensive, high-fidelity recording device might work, but a cheap one won’t. VRS is incredibly picky — even just adjusting your mic a different way from the usual can create different results.

      On another note: I read an article about what it physically takes to write with a goose quill as a pen, and what it must have been like for Shakespeare to get the words down on the page. A writer would have to stop and either dip the quill in the inkwell or sharpen the quill several times per page — talk about flow interruption. I think the moral of the story is that when you change one aspect of how you physically write (VRS vs. typing it in, writing with quill & ink vs. ballpoint), you have to change all of it.

      • Nonentity says:

         True.  If you did it in a quiet room and were able to feed the sound directly into the computer from the device, though, it might not be too bad.  If a decent, low cost device wouldn’t work, then I would have to wonder whether a better microphone for the computer itself would help him.

        Of course, even if it worked it wouldn’t address the third complaint he has: for most types of text, it’s just slower to speak than to type even if you can expect the listener to understand you.  Oh, for the day when we can plug our brains directly into the computer and write at the speed of thought…

        • Chrs says:

           Er, the speed argument you’re making is not strictly right.  Only very advanced typists can hit 120 words per minute, but normal speech is usually 150+


          • Nonentity says:

             Hmm, good point.  Though I have to wonder about “normal speech” being consistently that high.  The wikipedia page mentions that as recommended for audiobooks, which probably takes less thought than normal speech (and a rehearsed presentation is noted as slower).

            I suppose I was probably thinking of how much slower speech is when on the receiving end.  But in the context of trying to write a book, I wouldn’t be surprised if the speed difference ended up weighted more towards typing.

          • LennStar says:

            For Voice Recording I always have to slow down my speech about 1/3 (or it gets really messy). And even then it sometimes takes the software ages to “write”, in which time you could have typed another sentence.

            So in the end it is about the same speed – if
            - you don’t have direct speech, because the ” take time
            - you don’t use words or names that are uncommon
            - if you don’t care too much about spelling at the time of writing, but correct it later on

            and it is much more fun to type.

    • xian says:

      What would be really cool is VRS that records an audio track while transcribing your speech, allowing you to scrub back through the audio track while at the same time highlighting the text on screen.

      • Nonentity says:

        I have to wonder whether something like this has been created for medical or court records, since it seems like an obvious improvement to plain paper recordings created from dictation.  Storage might be an issue, but you could probably compress the audio quite a bit once the initial transcription is done.

      • M Alovert says:

        I believe Dragon DOES that and has for ages. I don’t remember how long any of the different versions (pro, etc) keep the audio (it might be just for a page or two), but it’s totally useful when you’re trying to identify a mistranscription later, and I think the medical/pro versions are designed to people who use that functionality- ie professionals or doctors whose secretaries or transcriptionists need to figure out any errors transcript after the fact.

    • p96 says:

      Could this be a good task for Amazon’s Mechanical Turk, at least for the initial transcription?

    • headcode says:

      On topic, if you haven’t already seen it this is a must view (it eventually becomes NSFW):

  3. Am I the only one who wants to go read some classical literature into voice-recognition software, just to see what it spits out?

    Edit: Yeah, other people probably have lives.

  4. MrEricSir says:

    Imagine trying to write software like this.


    You’d probably end up with this:

    pound include < String >

    • Tynam says:

       Sadly, since I broke my hand recently, I don’t currently have to imagine.  Even with all the tricks I already had set up to deal with my RSI, it took about ten minutes to give up, call the boss, and arrange to be on meetings for the next six weeks instead.

      Technical documentation is also bad, sometimes worse because it can seem to get it right to a casual glance.

  5. Andrew Smith says:

    My wife does medical transcription for a living and most medical transcription services offer non-medical transcription for other industries (she’s done subtitling for exercise videos, insurance claims, NCAA political videos).  I can’t imagine that sci-fi would be harder than medical transcription, especially for somebody like my wife that is already a sci-fi geek.  I don’t know how the rates compare out, but they usually do it by word or page count, so it would be easy enough to extrapolate.

  6. sofong says:

    Given an inability to type, I’d go for dictation onto a recording media that could then be typed out. Asimov was said to have used this tech, and I know of a couple of master’s theses successfully done this way.

  7. amnyc says:

    “Warm artichoke had an is at orange night light raining when come lit.”

    VRS: bad for writing fiction, great for writing dada-ist poetry.

  8. dainel says:

    I’m surprised nobody has mentioned India yet …

  9. grandmapucker says:

    I tried this recently, and it didn’t go well. “Lose his inheritance” came out as “lose his hair density.” Which may also have happened, but it wasn’t what I was after. I write longhand and then type it in (I can then edit as I do so), but I have learned to not go so long without typing stuff up. It took so much longer to fix the dictation software mistakes than it would have to just have typed it up normally.

  10. Joe Vanegas says:

    My voice recognition software is far better than that. I can’t write fiction in any medium, so I won’t opine about different tools for that task. If I have more than a paragraph or so in my head, I can get more of it out before forgetting, and do so faster, with VRS. The microphone adjustment is not super critical, but being in a quiet place is. My hopes of dictating in the car were thoroughly quashed!

  11. Here’s an essay by Richard Powers on how he does all his writing by voice:
    “… For one, I can write lying down. I can forget the machine is even there. I can live above the level of the phrase, thinking in full paragraphs and capturing the rhythmic arcs before they fade. I don’t have to queue, stop, batch dispatch and queue up again. I spend less mental overhead on orthography and finger mechanics and more on hearing my characters speak themselves into existence. Mostly, I’m just a little closer to what my cadences might mean, when replayed in the subvocal voices of some other auditioner.

  12. lorq says:

    I admit I’m puzzled by the article.  I use Dragon Dictate and don’t experience  anything like the error rate the author seems to.  Perhaps it’s a question of which software package is being used…?  Also, the article makes an issue out of the occasional mismatched word as though that derails the whole process.  But — why?  The trick with VRS, for me at least, has been to use it to generate an initial draft, without pausing at the end of every sentence to correct errors.  That’s what the second draft is for.  In my experience, VRS has been a huge productivity boost.

  13. Alessar says:

    I bought Dragon to try this out for Nanowrimo and I just couldn’t get past the small errors. For instance, if I said “select blah” it would select the word blah. If I said “strike that” which was the command the tutorial said would delete the current selection, then Dragon would deselect blah, jump to the next instance of that, and delete it. *headdesk*

  14. M Alovert says:

    I love doing stream-of-consciousness writing with voice recognition software. From the errors you’re describing, I’m guessing that your speech pattern is difficult for it to understand (the software should be trainable, but some accents and speech artifacts are more difficult, I think), or that your microphone is a problem.

    For technical stuff or work that involves a lot of formatting, it isn’t as useful to me, but Dragon was good enough at picking up my speech that it worked almost exactly as advertised for blog posts, articles, and other stream-of-consciousness writing.

    It took me some $150 worth of headset experiments to figure this out, though- the microphone made a very, very large difference in how clearly it picked up my ramblings. 

    Also, it took a huge mental shift for me to be able to compose that way- I used to think that typing was thought-to-paper-with-no-secondary-thinking sort of communication, but I eventually got to the point where dictating to the program was effortless as well.

    And yes- writing lying down. Or in a hammock on a sunny day. Or anything outside on a bright sunny day where the screen visibility stopped mattering (at least for the first draft)

    • Clive says:

      This has been my experience too! I can’t use Dragon to compose *formal* prose, because for that I need to edit the words of a sentence even as I’m composing it, something that is extremely hard to do with dictation software.

      However, Dragon works incredibly well for generating *informal* prose: Email, chat, (some) blog posts, brainstormed ideas, discussion-board posts, etc. And as it turns out, the volume of informal prose I generate is vastly huger than the volume of formal prose.

      For me, the biggest problem with dictation software is that it suffers from the autocorrect problem. If you make a typo while keyboarding, you can generally spot it pretty easily when you rescan the text for errors. But Dragon doesn’t make typos — it makes substitution errors, just like autocorrect on phones. Substitution errors are much harder to spot when you’re copy-editing or re-scanning a text, so the likelihood of writing an email with a few completely berserk substitutions is high.

  15. millie fink says:

    Richard Powers, five years ago–

    Except for brief moments of duress, I haven’t touched a keyboard for years. No fingers were tortured in producing these words — or the last half a million words of my published fiction. By rough count, I’ve sent 10,000 e-mail messages without typing. My primary digital prosthetic doesn’t even have keys.


  16. I am a closed captioner for the hearing impaired and use voice recognition software to do my job every day.  I do live news programming like CSPAN, Bloomberg, BBC and a ton of local news affiliates.  Our terms with the stations guarantee a 97% accuracy rate and we get live on-air quarterly reviews to make sure we are consistent.  My accuracies are rarely below 98%.

    Our voice recognition system (the company’s proprietary software) is far from perfect.  Too many words sound alike for it to work perfectly, but it is pretty remarkable just how well it does work.  That said, I can’t imagine writing with it.  I think it would be distracting and make for more work in the editing process.  

    Talking is also much more physically taxing than writing.  You know how your hands cramp up after a while at the keyboard?  Imagine doing it to your voice.  I can’t do my job when my voice burns out or I have a sore throat.  But I can use a keyboard.

    Next time you watch CSPAN, turn on the captions.  Now imagine there’s someone in a studio somewhere repeating everything the speaker is saying into VRS.  Notice the captions appear nearly instantaneously with what is being said.  It’s pretty fucking rad.  I think in 10 years, voice recognition will be so good and so omnipresent, we’ll all take it for granted.  

  17. Janvier says:

    Another annoying example of boing boing using a TLA (three letter acronym) without explaining it.  I had to Google RSI.  C’mon boingsters- better writing, please.   I know I’m talking to the wind here.

    • KBert says:

       Likewise… I’m still showing a tab “rsi – Google Search”

    • Lexica says:

       Maybe I’m old and cranky, but the idea that somebody can use computers and the net regularly  enough to follow Boing Boing but not know what RSIs are surprises me. Didn’t keyboard jockeys practically “invent” RSIs back in the ’90s? (Not really, of course, but the incidence of RSIs went through the roof.)

  18. It sounds to me as if the writer needs to get a better mic. That simple. I use a wireless broadcast lapel microphone and it works so much better than even my Rode NT1000 studio mic. Dragon screams along with it and over the last year the error rate has plummeted.

  19. Dave Morris says:

    Based on your example, I might quite like a novel written by unedited VRS. The first human-machine stochastic literature!

  20. eddbagenal says:

    So, as a professor of computing and interaction design, I have been telling my students for some time now to prepare and design with the aim and the assumption that much of the human-computer interfaces we see prevalent should collapse. This is because, quite simply, in most cases they are in the way of what we are trying to do. Computing enables a huge efficiency and increased scope of ability of course, but the points at which we meet computers still cause much friction, and so must improve incrementally until this is not the case. This will increasingly mean that existing barriers to entry into all manner of disciplines requiring any form of professional expertise will disappear in much the same way as we have seen barriers to entry into professional news media production systematically removed. It is fun to read you here Mr. Doctorow, evangelist for this very same progressive shift in the structure of these hierarchies, dismiss with such confidence the use of VRS (Voice Recognition Software) as a writing tool. In its current form perhaps, but let’s not forget what this is actually about at its core, Cory. The keyboard, the pen- writing- remain the barriers to entry into the (your) realm of professional storytelling. 

    Marshall McLuhan
    – “A goose’s quill put an end to talk, abolished mystery, gave architecture and towns, brought roads and armies, bureaucracies. It was the basic metaphor with which the cycle of civilization began, the step from the dark into he light of the mind. The hand that filled a paper built a city.” – Counterblast, 1954

    Talk. Storytelling. Word of mouth. We have only been writing en masse for a comparatively  short span of time, in comparison to that in which we have been speaking. The technologies and strictures of the alphabet, the word, written language are huge obstacles (technologies themselves) to be overcome in childhood if we are lucky. Clearly they are worthwhile obstacles, in terms of communication, but for the folk art of storytelling the idea that you need to sit quietly tapping at a selection of buttons for a month, like some lab-rat, is the very opposite of the spirit of the storyteller. Having seen some of your talks, it is clear that you are fortunate enough to be gifted as both a storyteller in person, by mouth, as well as through your writing. But we all understand, surely, that those people who are the very best storytellers, gifted spinners of yarns, are very often simply incapable, either due to time restrictions or inclination, of writing their tales down. Spoken word stories, as we know, were the only means by which most cultures’ mythology and allegories- their spirits- were preserved through generations. The keyboard is a blip in the narrative of storytelling. There will soon be a time where a full text of all the conversations we have throughout our whole lives, should we so chose, will be easily available as text, searchable, printable, ready for distribution, recall etc. etc. Who would not like to pick out an ancestor or favourite celebrated figure, a folk hero of any description and pick through their day to day, find those gems where they hold forth with some great tale were it possible? Many other applications spring to mind, but we are talking stories here.

    Please take this in the spirit in which it was intended. You write and deliver some fantastic and inspirational ideas, but language and stories are central to so much that will save us, and so dismissing the potential (inevitable?) emancipation of stories from the professional practice of writing seems cruel.

  21. pjk says:

    Dragon VRS learns from you and loads a different profile for each user. It’s pretty good right out of the box, but if you’re patient over a few hours and put some work into going back and correcting its mistakes, the software learns your quirks and preferences and vocal inflections. I use it for translation (which obviously has to be very precise) and it flies. My issue with VRS for fiction would be that it doesn’t give me a chance to pause and arrange my thoughts, but that’s a matter of process rather than tech.

  22. cmpalmer says:

    Why not just leave out the machine step? Dictate your writing, hire a typist, then hunt-and-peck your way through edits and corrections (or while looking over someone’s shoulder).

    Not endorsing or condemning the work he produces (I’ve only read a few of his books), but  I believe Kevin J. Anderson does all of his writing onto a digital recorder while hiking around the Rockies.

  23. jmdaly says:

    blind and beloved humorist james thurber wrote using the oldest speech recognition program of all time,  a stenographer.

  24. Alan Katz says:

    Advice from someone who uses S2T all the time: “Diction is done with the tip of the tongue on the teeth.”

  25. Thad Boyd says:

    Pratchett did an interview on NPR awhile back where  he discussed, among other things, having to use text-to-speech software to write because he can no longer keep his focus to type.  His major complaint was that the software was American and didn’t recognize British English — “You don’t have words like ‘arsehole’.”

    The interviewer laughed and said “We have that one, we just pronounce it differently.”

  26. BBNinja says:

    If it’s that important that you can’t be bothered to type yourself or you physically can’t then you could record audio, have it transcribed or just hire a typist.

  27. bruce_a14 says:

    Thanks, PAPPP, for those links.  The Dasher program looks like it might actually be usable with aheadtracker by my wife, who has both severe hand disabilities from decades of RA plus hand and arm tremors as side-effects of medications.  She’s tried voice-recognition software, but the frustration level is too high.

    Let me rephrase that first sentence:  THANKS!

