ddt-svaf.jpgI had dinner with Mr. Jalopy a couple of nights ago and he mentioned an e-mail I'd sent to him earlier in the week, in which I described how much I like my recently-purchased Dragon Dictate 2.0 speech-to-text application.

He told me, "In your e-mail you said that you hoped that you didn't sound too chatty. You didn't sound too chatty, but you sounded like you were crazy."

I understand what he meant. It's a lot easier for me to talk than it is to type, because I am a pathetically clumsy typist. But typing forces me to slow down and organize my thoughts, instead of running off at the mouth. However, I am so pleased with DragonDictate's accuracy, and how it saves wear and tear on my hands and wrists, that I am sold on it and never want to go back to typing.

The application costs $154 on Amazon, which may seem steep, but it comes with a nice USB headset. It takes only about five minutes to train the application to learn how you speak.The last time I tried using a speech-to-text application, the training took much longer and the accuracy was terrible. Also, this was many years ago, and computers were much slower so it took forever for the words to appear on the screen after I said them. I swore off speech-to-text programs, until I heard Alex Lindsay on MacBreak weekly talk about how much he loved Dragon Dictate. I decided to give it a try myself.

It's magical seeing my words appear on the screen almost as fast as I utter them. I think I would probably be too embarrassed to use DragonDictate in an environment where other people could hear me talking to my computer. I work alone in a home office, so it works for me.

  1. Does it understand non-American English? Dragon Dictate’s iPhone app doesn’t understand a single word I say in my South African accent. My wife, who’s American, has no problem getting the app to transcribe whatever she’s saying. Some years ago I tried training Microsoft’s transcription software to understand me, with very little success. I wonder whether things have improved for those of us who are accent-disadvantaged since then.

    1. I don’t know if this is what you’re running into, but the iPhone app seems to train to a specific user. It works for me very well, though telephone voice menus can never understand me, but thinks everything my 5yo says is “or or or or”.

    1. Yes I did use DragonDictate to write a rough draft of this blog post. But then I edited it by hand, or fingers I should say.

  2. We use this all the time for the clients at my non-profit. It’s really awesome assistive technology. Great software.

    Nuance, the company who makes it, has probably the worst customer service outside of Quark, but the software is wonderful.

  3. I tried using this at work after some nasty wrist surgery. Unfortunately, it doesn’t work well if you need to write XML or code. It really needs a bunch of language modes. Hmm, dictation software + Emacs…

  4. See I’m afraid my letters and E-mails would read like “and then we went to…no wait…delete that…no wait stop dictation…wait don’t type that….Damn IT Stop!! Dragon off! Whatever…”

  5. Dragon Dictate is a big advertiser on our favorite commentator’s show. That would be Glenn Beck. I wonder if they have a Glenn Beck crying mode – you can cry out your right-wing, conservative wisdom.

  6. I wonder how StT tech is going to impact communications; the cognition loops between speaking and typing are different, we (or at least I) think somewhat differently, depending on my mode of expression. Somehow my speed of thought is different, my ability to hang onto thought bubbles is different. I’m particularly curious about the impact this is going to have on the realms of commerce, politics and law. it will be transformative, and the transformation will leave a discernable, if small, wrinkle in history, i’m sure.

    I am very much looking forward to lots of reports about how this new generation of StT works for people!

    This has been a dragon-free comment, but perhaps if Santa loves me very much and is reading BoingBoing…

  7. If it worked well, particularly for people without American accents (anyone know if it does?), my bf could cut out the middle man of dictating his reports on tape and having the secretary type them up and email them back. It would save time also.

  8. I used an earlier version of this for transcription work, when my wrists were tired. Anyone who transcribes interviews would be well served by it.

  9. I’ve been using Dragon products for a couple of years, and they’re great. I work as a translator and I can dictate in Spanish, English, French, German or Italian and this software understands it almost perfectly (the accent can be an obstacle when dictating in a non-native language). This things saves me A LOT of time and it’s also good for your wrists.

  10. For years, I have been using the powerful, full, free SAPI4SDK from Microsoft. For those using a windows system, it is a complete text to speech and speech to text system, with sample applications, and all the Visual Basic .ocx files and instructions to write your own speech enabled applications, which is what I specialize in developing. While not a packaged consumer application like Dragon products, it does have functional speech to text and TTS applications, and so if you do not mind looking at a developers package, you can start to use Speech to text after installing it, and I have been impressed with it’s functionality and usefulness. I do not have a link to download it, but I think it is still available. A Google search may find it. Highly recommended for those into exploring speech applications or into writing some, without a cash layout first.

  11. What I am genuinely wondering is if this product can handle multiple speakers for say transcribing meeting notes. Most software I have tried totally blows at this as it seems to be expecting just one speaker and trains itself to that voice, cadence and style. Adobe SoundBooth produces some very add-lib-esque renditions. I suppose it might do better if I let it run the same file 4 or 5 times but even on a high end system that would take the better part of 2 days. Love the DD app for the iphone but then again that is a single voice situation and is far from perfect. Any input on that?

    1. “What I am genuinely wondering is if this product can handle multiple speakers…it seems to be expecting just one speaker and trains itself to that voice, cadence and style.”

      Are you asking if it accepts multiple users, or are you asking if you could just play it a tape recording of a meeting and have it transcribe everything it hears on the recording? If the former, I don’t know. If the latter, no, that is not what these programs do. They recognize only your clearly enunciated voice as spoken through, ideally, a headset microphone in a quiet environment. It won’t recognize a bunch of people’s voices recorded from across the room with an air conditioner whirring in the background and papers shuffling and people coughing and side conversations going on about when the meeting will be over.

      What you do is put a headphone in your ear playing the tape of the meeting, and then you repeat in your own voice (which the program knows), clearly (which the program requires), everything you are hearing. Humans can’t retire from this role yet.

      1. Thanks for that idea I hadn’t considered using myself as an intermediary. Though it’ll still be a pain in the ass it’ll be less so than typing it out manually.

  12. Damn I just realized that there is no Portuguese support on the DD iphone app, bummer. Two kinds of Espanol and three kinds of English but no Portuguese, boo.

  13. Speaking as IT Support that has had to deal with Dragon Dictate for the last 5 years (since dragon naturally speaking 8.0), let me say that it is one of the worst applications I have ever had the displeasure of trying to work with. It crashes. It crashes other applications. It causes problems with software that shouldn’t be in any way affected. And as mentioned above, the support from the company really is terrible.

    That said, the users do seem to love it.

  14. I bought the premium edition of Dragon NaturallySpeaking for Windows to help build a computer that my quadriplegic father can use. It’s going to be a Christmas gift, but in the testing that I’ve done so far, it looks like a winner to me!

  15. I’m surprised so many people here haven’t encountered speech recognition before. I no longer have hand injuries but I still enjoy using it for certain kinds of work, it does speed up my (already very fast) “typing” significantly, even after accounting for error-correcting.

    I’ve been using Dragon NaturallySpeaking (same thing, I believe for PC) for years, since version 4 (it’s now at 11). I haven’t tried out the latest iteration yet, but here’s what I’ve learned from my experience and comparing notes with others:

    -it does better with some accents than others. In the US, deep southern drawls are problematic for instance. Knowing how many syllables some Southerners manage to stuff into each vowel, this isn’t surprising. Yes, you train it, and the longer you train it and the more time you take to correct errors, the better it gets.

    -it’s ALL about the headset (and the cheapo that ships with it is great). Once I broke my initial headset, I had to literally try out about 10 headsets before I found a gaming headset that worked better than most of the others. This is the main advice I have for people for whom it hasn’t worked well.

    -At least up to Version 10, it sucks when used within other programs (it is supposed to give you full cursor control and they claim that you can do programming and spreadsheet work without issues, but I can’t imagine how that would work consistently given the problems I’ve seen when it operates within another piece of software). I’ve tried various versions on many different installations of Windows and on different machines, and it’s always had the same set of problems for me.

    -It’s pretty flawless within it’s own text editor (“dragonpad”). I was not fully disabled when using it, so I just got in the habit of using Dragonpad for long dictation (original writing, email, blog entries, etc) and saved my hands for all the other kinds of work I did on the computer, where a lot of mousing was better than trying to get Dragon to place the cursor in the correct place or get it to use all the tools available in a browser or spreadsheet or word processor.

    -speaking of mice, I didn’t improve some of my hand injuries until I found the right mouse and dramatically changed my mouse-operating habits. It’s not all about the typing. My favorite is a top-ball ball mouse that I learned to operate with either hand. People with more severe injuries have many ergonomic mouse/pointer options such as joystick type mice, foot-operated mice, etc. Coupled with voice recognition you can probably get out of a lot of bad computer-injury habits and reduce strain on your body but it requires a long period of re-learning posture and attentiveness to where your strain may be happening.

  16. I have often said that MacSpeech (who first developed Dictate) should have been paying me for all the free advertising I was doing for them. I also heard about it from Alex Lindsay as well, back before the app had a general release. It revolutionized my computer use, and I couldn’t shut up about it for months; proselytizing to everyone who would listen. It used to be that the only application that I could not do without (aside from a browser) was Photoshop, but DIctate is added to that list.

  17. Does anyone have an opinion on the different products for the Mac? I.e. how does MacSpeech differ from Dictation, etc.?

  18. Are there any speech->text apps that work even moderately well for programming languages?

    I know a great programmer who has nearly lost use of both hands. It would be wonderful to get him working again – both for him and the industry.

    1. I’ve been able to do some programming by voice with a combination of Dragon (dns 10 for windows) and the open-source add-on Vocola, which allows really easy voice macro creation. This requires some tweaking of Dragon’s default settings (e.g., don’t automatically capitalize after punctuation) and creating custom vocabularies for different programming languages (adding both function names and some longer chunks, so that “for the length of…” generates “for(i in 1:length(…”, etc).

      An awful lot of programmers get hand injuries. Resources are out there. speechcomputing.com is a good place to start.

      On a different note, imagine an office where people are having conversations and making telephone calls all day long. Oh wait, that’s every day in every office ever. I think that my monotone monologue with my computer (about half of which sounds like gibberish because I’m either writing code or issuing some kind of command) is actually less distracting than people having conversations which an accidental eavesdropper can understand. But the bigger point is, you get used to it when you have to.

  19. Even old versions of Dragon Naturally Speaking are remarkably accurate, as some have pointed out. Up to 99.999% accurate with enough training. I’ve been using it for years, with no complaints.

    It may only take 5 minutes to train it, but the more training you do, and the more you use it, the more accurate it becomes. I’m not sure if Dragon Dictate has the same correction features, but I would assume it does. This allows for even greater accuracy. In general, the accuracy improves with longer words.

    Getting it to distinguish between “bomb” and “Mom” consistently might take some time, though, especially if you’re prone to colds.

  20. I picked up Naturally Speaking 10 ($20 on eBay now that 11 is out). I was pleasantly surprised at how much it has improved since the last time I tried it. However, the cheap headset that accompanied the product didn’t work well at all, though my Otto gaming headset did.

    I see two problems preventing wider adoption of speech recognition: people don’t want to broadcast their thoughts as they compose a message (can you imagine an entire office full of people ‘typing’ out loud?), and people come across very differently in writing than they do in speech.

    The latter reason prevents me from using it. I find my thoughts are much more coherent when I take the time to compose them in writing. Plus, I mentally revise a sentence 3-4 times before typing it and for some reason I haven’t learned how to do that yet while speaking. Probably a learned aversion to pauses in speech, which might otherwise be filled with ‘uh…’ but can’t be when Dragon is transcribing your every sound.

  21. Part of me likes this because it is one step closer to the technology singularity and my being able to say “tea, earl grey, hot” and have some machine marvel whip one up for me.

    On the downside, it’s also another nail in the coffin of the transcription industry where I used to work before a large portion of the jobs were shifted to India.

    Hopefully the economy of the future will also improve at the same time.

  22. The description says it works with Safari, but would I be able to dictate a post on BB using Firefox as my browser?

  23. i have tried it and it did not work for me.
    first – i am a doctor which is one profession that is said to benefit the most. true, radiologists, pathologists and others use it with great success. in forensics (my field) it does not work at all. we have no limited vocabulary, in our text anything from street names to car brands to virtually all anatomy is present.
    so of course you use the good implementation of letting the program crawl over a lot of old documents. i did that over night with about 1000 autopsy reports and court statements. that worked well for town names, streets and local lingo. but it gave the program too many alternatives. so it became even more complicated and most important of all, it would fill in words gramatically correct but totally twisting the sentences meaning. which, as you can imagine, is a total showstopper in the forensic field or any medical field for that matter.
    of course i used the german version. but even in the english version the difference between “blood” or “clot”, “hypertension” and “hypotension” can make a very very big difference for the patient. imagine you do not catch that error and order a nurse to give medication to lower blood pressure. when low blood pressure was the problem in the first place. this can be fatal.

    also, the program is not honest in that respect. the story you read in the beginning to train it is more designed to impress the customer (see honey what i just wrote without my hands – awsome!) than to actually train the thing.
    in my version you could not interrupt the training. after initial disappointment i went for the extra long an thorough training only to realize that when i had to go and shut down my computer i could not do so safely in the program. so 2 hours of reading a bed time story to my computer was absolutely useless.

    also, my version required some serious patching out of the box. which sucked because it was more than 100 MB and had to be done from within the program. i did not have a good internet connection at the time and so i spent many hours downloading this ammount with a tethered cell phone.

    if you read up blogs from users, usually there are no big leaps in performance from version to version so even though i might have had an older version than the one tested here, i guess many points in my critique apply.

    the program works fine for not-so complicated texts (which can be typed easily as well) and repetitive vocabulary (for which you could use text blocks).
    if your vocabulary is very rich and you train the program with text documents, you just add a source for errors which are really really exhausting to eliminate. it is not just the spelling and the errors which are easily found because “the text just does not look right”. it is the terrible annoyance that you must check every sentence for coherence, grammar, tense, overall consistency. many sentences come out just as correct sentences. it is just not what you said.

    the final blow comes when you have one long text where it all falls together. for me it was a forensic bid for court where someone was attacked by a dog, and there were several dogs involved. the prog could not distinguish between the german word for dog (Hund) and the german word for “and” (und).
    Imagine you write a recommendation on woodwork and in every second sentence the program gets “would” confused with “wood”. thats when you throw it off your harddrive and out of the window.

    recognition rates as stated on the box are for advertisement and they are usually generated using simple text. even if the recognition was 95% (which it isnt), the remaining 5% would make it a pain in the ass.

    it also depends on your typing skills whether you would need a prog like that or not.

