Finding data you don't have

The challenge with chatbots is that they need training data to work. Yet this needs to be based on the exact language that customers use. One of the interesting statistics that was published by a chatbot provider in their 2017 AI Summit San Francisco presentation was that chatbot intent usage followed the 80:20 rule (80% of questions being handled by 20% of the intents). The challenge, however, was that the users unique use of language meant that standard training data and FAQ-style chatbot technology, the ability to match "answer" to the "question" failed over 30% of the time. The answer may be there, but AI systems matching it was not so easy.

A 30% failure rate is pretty catastrophic for a service that already has challenges winning over customers. So what is the answer?

It turns out that training data is the key to successful intent matching. Obvious right, but the challenge is to work out how real people talk, or more specifically, how real people talk to your chatbot, and this is a real challenge.


The resolution might be easy, look at archive transcripts of previous conversations. The more, the merrier, and despite the effort of sanitizing this kind of data for export and processing, it can be worthwhile because it should get to the nub of your users' language style.

Language repositories.

There exist a range of repositories that we can draw on for looking at how people speak. They may not relate to your industry or your product (and that can cause some issues), but it will still help in adapting to the style of communication that we need to adapt to.

Our solution - we have done the processing that enables us to easily generate alternative training data that will map to a much wider range of language uses. Give us a try, contact us to see how we can solve your data needs.

8 views0 comments