Every day, we edit and repair our conversations. We hesitate, we restart, we correct ourselves and each other on the fly. But what we do so naturally and collaboratively in human-to-human interaction is difficult to replicate in dialogue with devices. At Alana, we believe building this rapid repair capability into voice technology is going to be one of the biggest milestones in the future of conversational AI. And it’s one we’re keen (and on track) for Alana to lead the way with.
Huge advancements in conversational AI mean we can now engage in more natural and useful chat with voice technology. For example, virtual agents now have more natural sounding voices and can respond accurately to an increasingly wide range of requests.
But there’s something that’s still missing from the chat function of devices – the ability to pick up and respond appropriately to nuances in conversations that signal misunderstandings or to handle spontaneous corrections midway through spoken sentences. When these mistakes and repairs go unnoticed, conversations stray off track. Most of the time this leads to frustration, but it can be dangerous. Imagine you’re driving and have to restart and rephrase your line over and over to get a song you want to play or to correctly set a navigation system.
This post discusses why miscommunication happens and what makes it such a challenge for conversational AI technology.
Repairing misunderstanding in conversation
Successful conversation relies on both speaker and listener working together collaboratively, both investing effort into understanding each other.
When people speak naturally, they often hesitate, make false starts, pause, or correct themselves midway – this is because spoken conversation happens in real-time and is very spontaneous.
In addition, as human listeners in a conversation, we can easily mishear, misunderstand, or fail to decipher the correct meaning of a spoken sentence.
As a result, in human-to-human conversations, we work together to edit and repair these derailments in a line of chat by using a combination of verbal and non-verbal actions. For example, when we talk to a friend, a quizzical “huh” accompanied by a raise of the hands is a universally recognised cue that prompts us to adapt our response or to ask a clarifying question to fix a wayward interaction.
Currently, conversational AI agents struggle to recognise and adapt to miscommunication. This means when using virtual assistant devices, users need to plan what they’re going to say in advance, and say what they want to say in one go. They need to do this precisely, without pause or hesitation, and without mistakes. When they’re unable to do this the first time around, thay have to restart, and try again.
The importance of common ground in conversation
The essence of miscommunication can be explained by the ‘common ground’ theory of Herbert H. Clark. Based on this concept, there are four key stages in a successful conversation:
- Attend – listener notices what’s said
- Perceive – listener hears what’s said
- Understand – listener understands what’s said
- Accept – listener shows they understand the meaning of what’s said
In a successful conversation, common ground accumulates as the interaction progresses through these stages and moves on. As part of this process, listening participants will react proactively to achieve common ground. When this isn’t established at one of the above stages, miscommunication happens and the conversation will need to be repaired. This is currently a challenge for conversational AI.
The three main ways we repair conversations
First, some context to the concept of repairing conversations. There are three main ways we deal with miscommunication as it arises in the course of spoken conversation.
1. Disfluencies & self-correction
Spontaneous, spoken conversations are fast, unrehearsed and unedited. This inevitably means ‘clean’ flawlessly phrased conversations are rare. Instead, our verbal sentences are filled with pauses, hesitations, restarts and self-corrections that interrupt the flow. These ‘disfluencies’ – however, natural and obvious they feel to us – can cause conversational AI to flounder.
Example – An AI conversation derailed by an interruption to flow
(square brackets [] indicate overlapping speech)
User: Alexa, play some, uhhh
Alexa: [Sorry, I don’t understand that]
User: [The Doors]
2. Clarification
It’s normal for people to misunderstand each other often in everyday conversation. In a noisy environment or when we’re distracted, we mishear things or interpret them in a different way to what was intended. It’s a problem we can solve promptly as a human with a quizzical look and a clarification question that’s suitable to the context. But conversational AI can struggle to pick up on the cues that something is awry and therefore fails to respond appropriately.
Example – A conversation that proceeds after a clarification interaction
User: Alana, where’s my cane?
Alana: The metal one?
User: No, the wooden one
Alana: It’s right next to you on the floor
3. Late correction
Sometimes, despite our best efforts, we misunderstand each other, and only pick up on this several exchanges down the line when the misunderstanding is revealed in some way. There’s currently no such ‘Late Correction’ mechanism in conversational AI, preventing devices from being able to backtrack and repair a conversation with the ease a human can. It also prevents users from being able to correct themselves.
Example – Failure to recognise late self-correction
User: Drive to Ealing ah sorry uhh Richmond, London
Sat Nav: Okay, driving to Ealing Broadway, London
User: Cancel navigation
Sat Nav: Navigation has been stopped
User: Drive to Richmond, London
Giving Alana undo, redo and edit functionality
As the use of conversational AI expands, we’re becoming less tolerant of the inability of our devices to elegantly recognise miscommunication or adapt their responses when we self-correct. We’ll find ourselves saying, “no I meant this”, unable to resist the temptation to correct our device even when we know the system is unable to handle this type of command. Essentially, we’re starting to expect conversational systems to have the undo, redo or edit functionality that we enjoy in text editing software.
At Alana, we can see that advancing conversational AI to handle miscommunication is a crucial milestone in the evolution of voice technology. So, our research team is invested in evolving Alana to lead the way when it comes to integrating conversation repair mechanisms into its feature set. Watch this space.