The potential of conversational AI is clear and exciting – but only if we can rely on the accuracy of the information shared during an interaction.
Access to large language models has vastly extended the scope and chat capability of conversational AI technology. But, as the ability to build human-like relationships with end-users emerges, so too do complex considerations around the accuracy of the content shared. As a result, ensuring user safety is now arguably one of the most pressing challenges for anyone involved in designing open source conversational AI destined for mainstream use.
Training an AI on large amounts of data enables it to generate responses which are often indistinguishable to those of a human. This has opened many positive applications for conversational AI. From providing more personalised assistance to customers in a commercial setting, to facilitating companion technology that offers casual camaraderie or a search paradigm with the potential to knock Google off the top spot. Unfortunately, it also empowers AI agents to convincingly respond even when what they’re saying is incorrect or questionable.
What do we mean by safety?
We’ve all heard the stories – AI agents that can hold a decent conversation but are liable to blurt out instructions on how to make bombs or reinforce offensive stereotypes. Open source conversational AI generates human-sounding responses by extracting relevant content often from unmoderated data repositories (e.g. internet search engines or social media). Unfortunately, when scraping such vast sources, it’s feasible they pick up some toxic behaviour and messages along the way. This can cause these systems to respond with knowledge that isn’t correct or that’s skewed by an unfair or potentially damaging bias. However, because the response is fluent and eloquent, users can be easily convinced and misled by these answers. There are two main aspects of conversational AI safety we’re investigating at Alana:
- Factual correctness
More advanced and ‘human-like’ forms of conversational AI can generate responses by extracting and then piecing together results retrieved from a search on trusted sources. This so-called ‘abstractive’ approach might involve paraphrasing or combining several sources to answer a query more directly. Overall, this approach is highly effective but sometimes the end response can still be factually incorrect or misleading – a phenomenon known as ‘hallucination’ in neural generative models. In most cases, this will cause nothing more than annoyance – for example, it identifies two people with the same name and provides information based on a confusing mixture of the two. Other times, the consequences can be more serious – for example, if a virtual agent provides an answer combining conflicting treatment information to a user investigating health symptoms.
- Prejudice and partiality
A downside of conversational AI trained on large language models is they can regurgitate biases and mimic toxic parts of society. In terms of short-term harm, this might be slurs that insult the user. But there’s also potential for longer term harm to consider, such as reinforcing negative stereotypes and spreading conspiracy theories, extremist or discriminative views. This holds the potential to cause real-world harm.
Currently we’re facing a trade-off
The advances in conversational AI at the moment come with a trade-off. By allowing agents access to scrape huge data sets, we can significantly extend their breadth of ‘knowledge’ and give them the reference points to respond in a convincing, conversational tone. This adds breadth, depth and personality to our conversations with robots and generates more satisfying and enjoyable interactions. Unfortunately, this capability also opens up conversational AI to the safety issues mentioned in this article.
In some settings, this trade-off can be acceptable – for example, when the AI is taking the form of a controversial character in a computer game, the risk of the AI slipping up with the odd offensive remark may be seen as preferable to a safe but unengaging interaction. In contrast, in a care setting, it’s absolutely crucial that there’s no chance an AI could output anything that could cause harm, anger or upset
Recent advancements have confirmed the capability and potential of AI powered by large language models to convincingly mimic human language and interactions. In the Alana lab we’re now building technology that has the capability to show empathy, build rapport, and offer companionship. We can see this creates fantastic potential, but we also know this opportunity can only be maximised if we address the safety risks posed by inaccuracy and bias.
In short, safety has to be at the heart of future advancements in conversational AI. We’re developing the scope of Alana to play a bigger role in search, companionship and decision-making. But running in tandem, we’re also developing methods that build safety into our solutions, so we can give users confidence in the process of how Alana AI formulates every response.