The Six Common Challenges of Underperforming Intents

Improving underperforming chatbot intents can be a challenge for anyone looking to improve their conversational ai. In this blog, our Data Model Analyst Alison Houston shares the six most common challenges she comes across and how they can be fixed.

If you’re working on chatbots, you’re probably wondering how to improve your chatbot intents and where to start with underperforming intents. I’ve built numerous chatbots myself and analysed many client chatbots to provide them with helpful tips to improve the performance of their failing bots.  Over the years, I’ve come to understand the underperformance is usually down to some problematic intents within the bot falling into one (or more) of six common challenges: 

  • Weak concepts

  • Intent too wide 

  • Overlapping intents 

  • Lack of training data 

  • Imbalance of training data 

  • Human error  

Weak concepts 

A weak concept is where the most important phrase or set of words within an utterance are not represented very strongly within the intent’s training data, and therefore the bot’s learning value is weak.    

This is probably the most common issue I see and can easily be rectified.  The general rule I find works well is that for every concept within your intent, aim to have three utterances.  So, if I have an intent covering ways to get in contact with a company, a few example utterances may be: 

  • I want your email address 

  • Give me your telephone number 

  • What are your contact details? 

We have three concepts here:  email address, telephone number and mailing address, so each concept needs to have another two utterances added.  Each concept will then be reinforced, and the bot’s learning value is increased.  This means that whichever way a user asks for the company email address for instance, the likelihood of returning the correct intent, and with high confidence is greater. 

Intent too wide  

When an intent covers many different subject areas, it is at risk of being too wide.  It probably has many more training phrases compared to the rest of the model and could possibly be compromising other intents.  This may happen if you have an intent that has a lot of traffic going to it, and so it is tempting for the chatbot builder to pick many training phrases from the customer logs to keep adding to that intent in the training and improvement process.   

An intent should have a clear purpose and be focused.  You should be able to describe the purpose of the intent in one sentence. If you’re finding that your intent is getting a little too unwieldy, don’t be afraid to split the intent into smaller, more manageable intents.  Each of these smaller intents will then be highly focused and it is more likely your bot will understand them better. 

Overlapping intents 

This is when two or more intents have similar meaning or cover similar subject areas, and therefore will probably have related training data, causing confusion in the model.  You may see this problem in your bot if it’s consistently returning the same incorrect intent or it’s consistently returning the correct intent, but with low confidence, (or it could be a combination of both symptoms).   

If it’s vital to have two separate intents because they provide different answers, then you will need to carefully train each intent to ensure the set of utterances for each is very clearly defined and separated from each other to prevent overlaps.   

In some cases, it might be impossible to separate the two intents clearly enough and the best course of action would be to merge them into one intent and just provide the user with one answer which covers all aspects of the utterances.  You could also consider using entities within the merged intent which could trigger different answers within the same intent. 

Lack of training data 

This occurs where you have less than five utterances for an intent.  Most NLP providers recommend a minimum of five utterances; however, I find aiming for fifteen to twenty good utterances works well for most NLP providers.  Although with some intents you might find having five utterances works well, especially for basic intents like greetings ones and pleasantries: hello, goodbye, thanks etc.  After all, there’s only so many ways you can say hello. 

Imbalance of training data 

This is where a particular intent contains too many repeated common phrases like CAN I HAVE or TELL ME ABOUT or HOW DO I within its training data.  For instance, if you have one intent with a large amount of training data having the phrase CAN I HAVE in them; you are likely to influence the prediction of user questions to this intent, rather than the correct intent.  This is because the phrase CAN I HAVE has been reinforced so much, the bot has incorrectly learned the most important part of the training data is CAN I HAVE.   

We can easily get into bad habits of creating patterns within an intent, but always bear this in mind and always try to make the utterances as varied as possible.  Similarly, try to vary the personal pronoun by including some utterances with “we” instead of “I”, or “us” instead of “me”.   

Human error 

This is where training data has been added to the wrong intent in error, or the exact same training data has been added to more than one intent (although it’s impossible to do this in some NLP Providers as you’ll get an error message alerting you of this).  It happens, humans are not infallible!  And quite often it’s because more than one person is working on the bot.   

There is no simple solution to this, but usually doing a thorough testing of your bot, and continuous monitoring once the bot is live, will alert you to any of these types of anomalies within your training data.   


Chatbots are being used everywhere now, there is enormous potential in more intelligent bots, and we’re all trying harder to improve our bot’s performance.  Hopefully you’ll be able to identify that some of your own struggling intents probably fall into one of the above types of problems and this may make it easier to rectify them.    

QBox is a tool that can also help to identify your weak intents, analyse the reason for those weaknesses and dramatically improve your understanding of the performance of your model. To find out more, visit 


This diagram is a screenshot of QBox in action.  This is the correctness distribution diagram and shows a typical example of how an intent would look that is possibly too wide, ie trying to cover too many topics.  The circle in the middle represents the focused intent, the circles on the outside represent intents that are getting confused with the focused intent.  The data points represent the utterances, green dots represent correct predictions and red dots represent incorrect predictions. 


This diagram is another screenshot from QBox and shows a typical example of how an intent would look that is possibly overlapping with another intent.  There are a cluster of data points, representing the utterances for the focused intent, all being confused with another intent. 

Ready to give QBox a try? Get 5 free tests here. 

Drop us a line to schedule a demonstration here. 

See an overview of what QBox can do here. 

Alison Houston

Alison is our Data Model Analyst and builds and trains chatbot models for clients. She also provides advice and troubleshooting support for clients who are struggling with the performance of their own chatbots.

Follow me on LinkedIn