Understanding the importance of having a balanced chatbot model
Whether you’re at the stage of starting to build a chatbot model or training the model to optimize performance, it is important to ensure each intent in your model has roughly the same number of utterances. Balanced intents mean a balanced model, this will allow for better accuracy in detecting and returning the correct intent.
If you have a chatbot model that has some intents containing very few utterances, and some containing vast amounts of utterances, this strong imbalance can lead the classifier to make very biased decisions, as they tend to be attracted to these larger number of utterances; what we call ‘greedy intents’. As a result, this will cause the smaller intents to struggle with performance as they’re battling with the intents that have larger number of utterances.
We see a lot of customers chatbot models that are very imbalanced – they have many intents with fewer than 10 utterances and some intents with 200+ utterances, this is usually part of the reason for seeing poor performance (and the imbalance tends to happen naturally and not purposefully, so it’s very important to keep monitoring the model for any signs of imbalance). However, we understand in the real world it’s not always possible to have the same number of utterances per intent. So, we usually advise our customers to try to make sure all intents have the same number range, for example between 30 to 80 utterances per intent could be a good optimum range to aim for. If they do have the odd intent that has far more utterances compared to the range, we ask - is it necessary to have that much data?
We usually find there are many utterances in the intent that are almost identical with just one- or two-word differences – these do not add much learning value to the model, so we recommend removing a lot of these similar utterances. Also, we find oversized intents, which are trying to cover too much. We recommend dividing them into subcategories, if possible, to create smaller, more manageable intents.
By having a balanced list of intents, you will be more likely to have an unbiased chatbot that achieves better results.
Our new ‘Model balancing’ feature in QBox will indicate which intents in your model are causing an imbalance (because of having either too much or too little training data). The information is presented in an easy-to-understand way, and you can quickly and easily identify any problem intents.
Our latest Model balancing feature in QBox
Get in touch with us to see a demo of QBox.
P.S. if you want to learn more about the latest trends on training data and model balancing, please have a look at our annual NLP report.