Communicating through our computers at any length is still primarily taking place through some physical or virtual manifestation of the QWERTY keyboard – a device invented for the early typewriters of the 19th century, and supposedly designed – intentionally – to work slowly even then.
This is not where we want to be in the 21st century.
The vision many of us grew up with comes from Star Trek, where the Captain would say “Computer, what alien races inhabit this system?” – and the Enterprise’s electronic brain would speak up and give him a report. This is where we want to be in the 21st century. And we no doubt will, but not just yet. There’s a lot of century still ahead.
Why not yet? For some years now our computers have been evolving ever higher levels of AI, and better input methods are lagging. They are lagging because while data analysis and processing is something computers do exceedingly well, speech recognition and natural language understanding are notoriously difficult for anything but our biological brain. We are seeing progress – Siri, Cortana, the nameless Google speech search app, are hearing our questions and giving us some useful answers; but for serious natural language work you need IBM’s Watson – a $3 million specialized supercomputer.
Still, progress is being made at an accelerated pace: Microsoft announced in October 2016 that their speech recognizer has achieved human parity in conversational speech recognition – that is, it has attained the error rate of a human listener (who, you may be surprised to learn, errs on 5.9% of the words). This is a huge achievement on the input side of NLP.
The next big hurdle is understanding the words that have been recognized, what they usually mean, and what the human uttering them meant (which may vary depending on whether the human was joking, sarcastic, angry, and so forth). There is a way to go here, but progress is being made (I remember yelling at an airline’s automatic voice response system “I want to speak to a human being!” – and it let me. It recognized my anger from my voice!).
But let’s fast forward to the day when computers do attain perfect understanding of our intentions from our speech. What then? In particular: what will it mean for our computer mediated communications such as email? It certainly won’t be a simple move from keyboard to microphone. Just imagine the cacophony in an “open space” office floor if everyone were to talk aloud to their machines! There was even a joke about it in the nineties – the disgruntled employee that runs down the central aisle yelling “Select all! – Delete! – Yes!”.
Here are some thoughts, then, about email in the era of voice recognition.
- We may limit the technology at first to dictation; that is, you’d dictate an email to the computer, then go over it and edit it with your keyboard. This is important because any message of more than a line or two requires editing; certainly, if it involves anything critical or emotionally sensitive. The time gained from talking instead of typing may then be negligible, and I’m not sure the exertion would be less. Considering time put into typing today, people might all get a sore throat…
In fact, email dictation is fully available today – and very few people (other than the disabled) make use of it.
- To be useful in this modality, the computer would have to be truly smart. You’d want to tell it what to do and not dictate the words. Think of the old-style admins or secretaries: the boss could say “Send Mr. Jones a letter of rejection like the one we sent Mr. Smith, but make it a bit more cordial than that one”; the secretary would do the rest. In 20 years, computers might be able to do that, and the voice recognition would be used for instructing them, not for typing.
- But then, perhaps by then computers will get SO smart that we won’t need most of the emails – perhaps people’s computers, like loyal servants, will communicate directly with each other to manage most the interactions that now require human agency connected through email or other messages. Of course, there would still be love letters…
- And who knows – maybe once the technology comes of age, we’ll come up with a totally novel way of applying it. At the rate computing progresses this may well be what the future holds – and hides from us.
Basically, what I’m saying is that the speech-capable computers in our future will be a completely different breed from the machines we use today; they will do wonderful things, not simply what they do today but using voice. And it looks like we won’t have to wait long to see if I’m right…
Latest posts by Nathan Zeldes (see all)
- The two classes of organizational email overload engagements - March 1, 2017
- Star Trek, voice interaction and the future of email - December 14, 2016
- What Will We Want From our AI? - October 10, 2016