Has the news of Microsoft’s acquisition of Nuance brought the topic of conversational AI into your boardroom? Conversational assistants are definitely a hot topic at the moment. And with pressure to join the voice-first revolution mounting, businesses are suddenly becoming acutely aware of the potential deficiencies of their data.
What types of data do we need to support artificial intelligence? Do we have enough data? Will our data need to be restructured? Where can we get more data? Boardrooms are currently swimming with data-related dilemmas. So, in this article, we talk about how this information is used to train conversational assistants, so you can assess whether you have the right foundations to support your AI ambitions and what you can do if you don’t.
What is the role of data in Conversational AI?
We expect a lot from Conversational AI nowadays. We trust the Just Eat AI assistant to deliver our takeaways to our exact specifications. Amazon’s AI agent can help us find our missing parcels without us having to engage with a human at all. And there are increasing opportunities to use AI to engage the visitors to your website by helping them find exactly the answer they need before they consider straying onto a competitor’s site.s
But how do you train a computer to know what questions to ask to drill into our exact requirements? And where does its knowledge come from in the first place?
As humans, we learn from our mistakes. To enable computers to behave in this way, they need to be spoon-fed this experience in the form of vast amounts of carefully labelled data. This enables your AI assistant to recognise patterns, dissect language, learn from previous behaviours and make autonomous decisions so they can respond in an appropriate and human-like fashion.
For example, to train Alana to ask the appropriate follow-up questions to decipher the intent behind a query, we take unstructured information from a range of internal and external sources. Then, with careful labelling we build a repository of machine-readable training data. This feeds the machine learning model that allows an Alana-powered assistant to adeptly steer any conversation in the right direction.
Do we have enough data?
Data efficiency and how we train language-generating AI machines to work from much smaller amounts of data is an important area of research Alana is leading. But right now, it’s undeniable that data is still in high demand when it comes to anything to do with artificial intelligence.
For example, you may have heard it took 341 GB of data and 2.6 billion parameters to train Google’s voice assistant, Meena, to chat intelligently about virtually any topic. But did you know KLM used a more modest (by AI standards) 60,000 questions from its customers to train its BlueBot to book tickets and handle common customer FAQs?
As you can see, the data demands of a project vary considerably – there simply isn’t a single useful ballpark to guide you on how much data you need to train a digital assistant. But, as a general guide, a simple chatbot solution that works on rigid rules and provides a short, scripted answer, is less data hungry. However, once you move into teaching a computer to ask follow-up questions and use more conversational language, the data demands increase.
What data works well with conversational AI?
Where does this data come from? The initial training dataset tells your AI assistant what people will be saying to it and allows it to work out how to respond.
This data comes in many forms and often already exists within your organisation. You just might need some help knowing where to look. For example, with a narrow customer support remit, you may be able to create a script for a simple customer services AI assistant using data you already have in your CRM system. And if you want to adopt an interactive approach to answering customer FAQs, conversations extracted from emails and call centre transcripts can help train your AI assistant to handle common queries.
So, the short answer is, you may already have a great foundation level of data, but the challenge is organising this information into a format that a machine can read and use.
How to get your data AI ready
Do you have too much data and find yourself grappling with the challenge (and cost) of how to organise it? Maybe you’re worried too little data will allow bias to creep into your assistant’s responses? And what exactly are the correct ethics when it comes to using the data you have gathered?
In reality, the data we collect in our everyday operations is messy, incomplete and is unlikely to have the structure demanded by machine learning algorithms. But this doesn’t have to be a barrier to achieving your conversational AI ambitions.
That’s where we can help you.