aNewDomain.net — In his continuing series covering speech recognition, our Lamont Wood asks the ultimate question. Of two leading options — Nuance Communications’ Dragon NaturallySpeaking (DNS) and Windows’ built-in Windows Speech Recognition (WSR) — which one is better? And at what? Deep dive …
For general purpose desktop dictation you have two choices: Windows Speech Recognition (WSR), an unheralded feature that’s actually built into Microsoft Windows, or Dragon NaturallySpeaking (DNS), an off-the-shelf software package from Nuance Communications.
So the question is, which one is better?
Well, WSR is free and is already installed. But you can’t use it until you acquire and install a suitable headset microphone and then figure out how to invoke the software.
As for Dragon NaturallySpeaking, you have to pay for DNS and install it. Yet it comes with the necessary microphone, not to mention far more extensive documentation.
After that top-line comparison, the easiest way to answer the question as to what program is technically best at what it was built for lies in the real world. How do the two perform with actual document production? I put each to the test. Here’s my product review shootout, pitting Windows Speech Recognition and Dragon NaturallySpeaking head-to-head.
For purposes of comparison, I used a widely-circulated government document known as “Lincoln’s Gettysburg Address.” Typed, it has 271 words. Pronouncing all its punctuation marks — as you have to do with speech recognition — makes it 314 words long. Also for the purposes of comparison I used dictionary words, not five-keystroke words.
First, I timed how long it took to type it in. Keyboarding the document’s 271 words took 314 seconds, representing 51 words per minute. There were 25 errors, giving an accuracy rate of 91 percent. Correcting brought the time to 475 seconds, for a throughput of 34 words per minute. So there’s the bar. Now compare that to what it took in Dragon NaturallySpeaking and Windows Speech Recognition.
Dragon NaturallySpeaking: Dictating the 314 utterances with DNS, I finished the draft in 159 seconds, for a rate of 126 words per minute. There were two recognition errors, for 99.4 percent accuracy.
Correcting the errors brought the process to 255 seconds, for a throughput of 74 words per minute.
Windows Speech Recognition: Dictating the 314 utterances with WSR, I finished the draft in 164 seconds, for rate of 115 words per minute. There were seven errors, for 97.8 percent accuracy.
Correcting the errors brought the process to 258 seconds, for a throughput of 73 words per minute.
Analysis and caveats
I had long felt that DNS was a little more accurate than WSR—and the results bear that out. Also, I have long felt that using either approximately doubles my throughput, and the results bear that out, too.
But there are caveats to consider.
The first caveat to think about is that I personally am quite skilled using both systems. I was alone in a quiet room for both tests and, after all, Lincoln’s Gettysburg Address does not present a challenging vocabulary. Your results will vary.
Meanwhile, this comparison involved dictation — that is, the input of text composed by someone other than you. Speed and accuracy are helpful with such tasks. But most office workers are trying to compose original text, so what matters is the system’s ability to keep up with them as they decide what to say.
How fast can you decide what to say? A solid day’s work for a professional writer is something like 1,000 to 2,000 publishable words.
Using either DNS or WSR you ought to be done for the day in about 20 minutes. Obviously, it doesn’t work that way.
In my experience, the rate at which you compose coherent, grammatical text is about 50 words a minute in sporadic bursts.
During the intervening periods you will be staring at the blank page, sweating blood. Anyone who’s completed a professional typing class should be able to type 50 words a minute. And doubtless people have produced 2,000 words a day using quill pens.
The real advantage …
Because speech recognition can’t accelerate the decision-making process — and because keyboarding is fast enough for composition (at least for me) — you must ask yourself what the advantage of using speech recognition really is.
In my experience, its advantage is that it makes the composition process essentially effortless. You decide what you’re going to say, you say it, and it’s there on the screen. The keyboard is no longer a stumbling block — nor a source of intractable muscle problems.
But you still have to get there from here. Converting to speech recognition as a non-trivial task requires a commitment of several weeks during which your productivity may fall. The trick is to remember the effort it took to learn to type originally. Learning speech recognition will not be as demanding, and the payout will be much greater.
For aNewDomain.net, I’m Lamont Wood.
Based in San Antonio, Texas, Lamont Wood is a senior editor at aNewDomain.net. He’s been covering tech trade and mainstream publications for almost three decades now, and he’s a household name in Hong Kong and China. His tech reporting has appeared in innumerable tech journals, including the original BYTE (est. 1975). Email Lamont at Lamont@anewdomain.net or follow him @LAMONTwood.