aNewDomain.net — In his continuing series covering speech recognition, our Lamont Wood asks the ultimate question. Of two leading options — Nuance Communications’ Dragon NaturallySpeaking (DNS) and Windows’ built-in Windows Speech Recognition (WSR) — which one is better? And at what? Deep dive …
For general purpose desktop dictation you have two choices: Windows Speech Recognition (WSR), an unheralded feature that’s actually built into Microsoft Windows, or Dragon NaturallySpeaking (DNS), an off-the-shelf software package from Nuance Communications.
So the question is, which one is better?
Well, WSR is free and is already installed. But you can’t use it until you acquire and install a suitable headset microphone and then figure out how to invoke the software.
As for Dragon NaturallySpeaking, you have to pay for DNS and install it. Yet it comes with the necessary microphone, not to mention far more extensive documentation.
After that top-line comparison, the easiest way to answer the question as to what program is technically best at what it was built for lies in the real world. How do the two perform with actual document production? I put each to the test. Here’s my product review shootout, pitting Windows Speech Recognition and Dragon NaturallySpeaking head-to-head.
For purposes of comparison, I used a widely-circulated government document known as “Lincoln’s Gettysburg Address.” Typed, it has 271 words. Pronouncing all its punctuation marks — as you have to do with speech recognition — makes it 314 words long. Also for the purposes of comparison I used dictionary words, not five-keystroke words.
First, I timed how long it took to type it in. Keyboarding the document’s 271 words took 314 seconds, representing 51 words per minute. There were 25 errors, giving an accuracy rate of 91 percent. Correcting brought the time to 475 seconds, for a throughput of 34 words per minute. So there’s the bar. Now compare that to what it took in Dragon NaturallySpeaking and Windows Speech Recognition.
Dragon NaturallySpeaking: Dictating the 314 utterances with DNS, I finished the draft in 159 seconds, for a rate of 126 words per minute. There were two recognition errors, for 99.4 percent accuracy.
Correcting the errors brought the process to 255 seconds, for a throughput of 74 words per minute.
Windows Speech Recognition: Dictating the 314 utterances with WSR, I finished the draft in 164 seconds, for rate of 115 words per minute. There were seven errors, for 97.8 percent accuracy.
Correcting the errors brought the process to 258 seconds, for a throughput of 73 words per minute.
Analysis and caveats
I had long felt that DNS was a little more accurate than WSR—and the results bear that out. Also, I have long felt that using either approximately doubles my throughput, and the results bear that out, too.
But there are caveats to consider.
The first caveat to think about is that I personally am quite skilled using both systems. I was alone in a quiet room for both tests and, after all, Lincoln’s Gettysburg Address does not present a challenging vocabulary. Your results will vary.
Meanwhile, this comparison involved dictation — that is, the input of text composed by someone other than you. Speed and accuracy are helpful with such tasks. But most office workers are trying to compose original text, so what matters is the system’s ability to keep up with them as they decide what to say.
How fast can you decide what to say? A solid day’s work for a professional writer is something like 1,000 to 2,000 publishable words.
Using either DNS or WSR you ought to be done for the day in about 20 minutes. Obviously, it doesn’t work that way.
In my experience, the rate at which you compose coherent, grammatical text is about 50 words a minute in sporadic bursts.
During the intervening periods you will be staring at the blank page, sweating blood. Anyone who’s completed a professional typing class should be able to type 50 words a minute. And doubtless people have produced 2,000 words a day using quill pens.
The real advantage …
Because speech recognition can’t accelerate the decision-making process — and because keyboarding is fast enough for composition (at least for me) — you must ask yourself what the advantage of using speech recognition really is.
In my experience, its advantage is that it makes the composition process essentially effortless. You decide what you’re going to say, you say it, and it’s there on the screen. The keyboard is no longer a stumbling block — nor a source of intractable muscle problems.
But you still have to get there from here. Converting to speech recognition as a non-trivial task requires a commitment of several weeks during which your productivity may fall. The trick is to remember the effort it took to learn to type originally. Learning speech recognition will not be as demanding, and the payout will be much greater.
For aNewDomain.net, I’m Lamont Wood.
Based in San Antonio, Texas, Lamont Wood is a senior editor at aNewDomain.net. He’s been covering tech trade and mainstream publications for almost three decades now, and he’s a household name in Hong Kong and China. His tech reporting has appeared in innumerable tech journals, including the original BYTE (est. 1975). Email Lamont at Lamont@anewdomain.net or follow him @LAMONTwood.
This is killer.
Thanks Lamont, this was really useful.
There’s another reason for using these tools. I am a software developer/manager who over the years has developed arthritis in my hands. Without either of these tools my career likely would have ended. With that in mind, I will say in a world where you have to use this day-to-day I find Dragon very good at learning and translating my speech. However, Windows recognition absolutely dominates navigating the Windows environment. In all honesty, I wish you could combine the two applications.
That was a complete waste of my time. your conclusion was, who cares just type.
@claire – Well, for some of us, “just type” isn’t an option. My ability to use a keyboard, thanks to multiple sclerosis, is extremely limited. I have almost no feeling in my fingers, so I cannot touch type. I also lack precise finger movement control, so even if I am looking at the keyboard, I will get the wrong key 80% of the time; if I use just my right index finger alone, however, my accuracy rate is a tiny bit better than 50%, though WPM with one finger is pathetic.
Without speech recognition, and most especially without Dragon, I would have had to stop using computers more than a decade ago.
So I do care very much.
Oh, by the way, It would seem that you’ve never even bothered to give speech recognition a try. Contempt prior to investigation is not a good approach to learning.
Excellent article. Thank you for explaining the difference is between these two voice recognition systems.
Having recently seen a hand specialist regarding my carpal tunnel issues, he recommended that I purchase and use dragon naturally speaking. After I did so, I then learned that I had Windows Speech Recognition built in to my computer. While the comparison of dictation, in my personal experience, also resulted in a determination that both are roughly equal, my difficulty comes with navigating within documents, as well as other programs.
Using the same microphone, I find that it appears Windows Speech Recognition understands my commands when in other programs, for example opening and navigating within programs such as WORD and EXCEL, as well as using Chrome. While I’m sure there is a learning curve for both, my overall impression at first is that Windows seems to understand my commands to do certain things better than Dragon.
I have been using Dragon NaturallySpeaking Professional (it’s the same speech engine, but allows me to create custom command scripts) that I was hit hard by multiple sclerosis in 2000. Back then, the software was very primitive, and its processing demands tended to overwhelm the fastest machines of that day.
Today, for dictation purposes, Dragon is amazing, and WSR is not far behind, as you discovered.
However, Dragon also shines at being able to natively dictate into any text field that is visible anywhere on the screen. WSR has trouble with some Windows applications, and a fair number of website objects.
The big thing for me is that Dragon lets me control just about everything I do my computer (onceI get past the logon screen). Starting and closing applications. Mouse movement and clicking functions by voice; again, anywhere, in any application. I can do it all. Although it would be extraordinarily cumbersome, I could even use it to do Photoshop. The one thing it fails that is any game that requires quick user reaction (like combat games).
I didn’t spend a lot of time with this aspect of things in WSR, but I know it is a lot less ubiquitous.
I’m very grateful that speech recognition exists as an effective tool, because I cannot make productive use of the keyboard and mouse anymore. Although it might seem odd, while I cannot type, I can still write legibly with a pen. Or, to my tremendous satisfaction, with a stylus and a Tablet PC, which has tremendously accurate handwriting recognition built in. I still prefer Dragon, but writing in longhand on a touchscreen that does handwriting recognition is ideal for those situations where dictation aloud is not an option.
I will also mention, by the way, that both Dragon and WSR are very flexible about microphones. On my computer I have a headset microphone, and an array microphone that sits on the desk in front of me. The headset microphone is more accurate, but my movement is restricted to the length of the microphone cable; the array microphone allows me to stand, stretch, pace, while I am composing. A wireless headset microphone would probably solve the same problem, but I’ve not been motivated to spend the money for that.