Oct 20, 2021

How Can I Generate Chatbot Utterances?

Generating chatbot utterances can be a challenge for anyone looking to improve their conversational ai. In this blog, our Data Model Analyst Alison Houston shares advice and best practice .

Chatbot builders often have trouble generating good quality training data for their bot. But what defines “good quality”? This is the golden question that all of us chatbot builders would like to learn more about. I’ve listed below some problem areas that stop us generating good quality utterances, and some possible solutions.

Tunnel vision with generating utterances If all the utterances are being generated from one chatbot builder, there’s a danger a bit of tunnel vision may happen due to working so closely on the project, and the training data might not be as varied as it could be. What tends to work well is once the bootstrap training data has been generated, get colleagues, friends or family (anyone not directly involved in the project) to help. Simply give them a brief explanation of each intent (but not too much detail) and ask them to list some utterances on how they would ask them. Hopefully this will give a good variety.

We might not use the correct terminology, especially if the bot covers more technical subjects. The end user needs to be considered, for example, if you have experts using your bot, you need to craft your utterances in such a way that reflects this. Get the expertise of a SME (Subject Matter Expert) if necessary, to ensure that correct terminology is used in the utterances.

We easily get into habits of creating patterns within an intent, for example, here’s an extract of some utterances in an intent about how to contact a company:

Can I have your telephone number?
Can I have your email address?
Can I have your website address?
Can I have your mailing address?
Give me your contact details?

Give me your e-mail?
Give me your site details?

These utterances would mislead the NLP engine to think the “Can I have” and “Give me your” part of each phrase is the most important part of this intent, and there is a danger it could artificially skew that intent over another. Try to make the utterances as varied as possible. Similarly, try to vary the personal pronoun by including some utterances with “we” instead of “I”, or “us” instead of “me”. Here’s some examples of improvements in the above utterances:

I would like your telephone number
Could we have the (company name) email address?

I want your website address
We need the company mailing address
Give us your contact details
What is your e-mail?
Can you give the site details?

We fall into the trap of using real user questions which may be far too long and/or contain irrelevant items. It’s great to use real customer questions if you’re lucky enough to have them, however they’re not always good quality. Users tend to be very chatty when using chatbots and sometimes they don’t get straight to the point of their intent. When using real questions, cut the waffle and make each one into a brief and clearly expressed piece of training data.

Other things to bear in mind to help generate good quality utterances:

Utterances should be varying lengths, although for the longer utterances, you should ideally aim for less than 12 words
Entity placement needs to be varied so your bot understands context. For example, “Tell me about the safety features of your hybrid cars”, “Do the hybrid cars have good safety features”, “ Hybrid car safety feature information please” (hybrid being the entity here)
Include some utterances with plurals/various tenses (past present and future if applicable). For example, “I’m looking to buy a car”, “We’re interested in buying a car”, “Do you have any electric cars for sale?” (although some NLP engines can analyse semantic features in utterances)

Include some utterances with punctuation and some without (some NLP engines will normalize punctuation or have an optional function to do so)
Ensure there are no unintentional typos in your utterances, however it might be good practice to include commonly misspelt words in some of them (some NLPs have the autocorrect feature which can be activated)
The use of a thesaurus is an invaluable tool, which will help to include a variety of synonyms for the key concepts within the intents
Obviously, once your chatbot is live, you’ll be monitoring the real user questions coming in, and this will help to generate new utterances where there are learning or knowledge gaps in your bot (carefully edited to ensure they are good quality ones, of course!)

There are also utterance generator tools out there that can be used. Some generate utterances by you supplying the intent name and a syntax, and then the tool will create variations based on that syntax and some of them generate utterances based on your existing training data. QBox is an example of the latter and uses the power of QBox technology to generate utterances based on existing data.

To find out more, visit QBox.ai.

Take the QBox course, designed by Alison here.

Alison Houston

Alison is our Data Model Analyst and builds and trains chatbot models for clients. She also provides advice and troubleshooting support for clients who are struggling with the performance of their own chatbots.

Follow me on LinkedIn